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Preface 

For many years, the human being has been trying, in all ways, to recreate the com- 
plex mechanisms that form the human body. Such task is extremely complicated 
and the results are not totally satisfactory. However, with increasing technological 
advances based on theoretical and experimental researches, man gets, in a way, to 
copy or to imitate some systems of the human body. 

These researches not only intended to create humanoid robots, great part of them 
constituting autonomous systems, but also, in some way, to offer a higher knowl- 
edge of the systems that form the human body, objectifying possible applications 
in the technology of rehabilitation of human beings, gathering in a whole studies 
related not only to Robotics, but also to Biomechanics, Biomimmetics, Cybernetics, 
among other areas. 

This book presents a series of researches inspired by this ideal, carried through by 
various researchers worldwide, looking for to analyze and to discuss diverse sub- 
jects related to humanoid robots. The presented contributions explore aspects 
about robotic hands, learning, language, vision and locomotion. 

From the great number of interesting information presented here, I believe that this 
book can offer some aid in new research, as well as stimulating the interest of peo- 
ple for this area of study related to the humanoid robots. 



Editor 
Armando Carlos de Pina Filho 
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Design of Modules and Components for 

Humanoid Robots 

Albert Albers, Sven Brudniok, Jens Ottnad, 
Christian Sauter, Korkiat Sedchaicharn 

University of Karlsruhe (TH), Institute of Product Development 

Germany 



1. Introduction 

The development of a humanoid robot in the collaborative research centre 588 has the 
objective of creating a machine that closely cooperates with humans. The 
collaborative research centre 588 (SFB588) "Humanoid Robots - learning and 
cooperating multi-modal robots" was established by the German Research 
Foundation (DFG) in Karlsruhe in May 2000. The SFB588 is a cooperation of the 
University of Karlsruhe, the Forschungszentrum Karlsruhe (FZK), the Research 
Center for Information Technologies (FZI) and the Fraunhofer Institute for 
Information and Data Processing (IITB) in Karlsruhe. 

In this project, scientists from different academic fields develop concepts, methods and 
concrete mechatronic components and integrate them into a humanoid robot that can share 
its working space with humans. The long-term target is the interactive cooperation of robots 
and humans in complex environments and situations. For communication with the robot, 
humans should be able to use natural communication channels like speech, touch or 
gestures. The demonstration scenario chosen in this project is a household robot for various 
tasks in the kitchen. 

Humanoid robots are still a young technology with many research challenges. Only few 
humanoid robots are currently commercially available, often at high costs. Physical 
prototypes of robots are needed to investigate the complex interactions between robots 
and humans and to integrate and validate research results from the different research 
fields involved in humanoid robotics. The development of a humanoid robot platform 
according to a special target system at the beginning of a research project is often 
considered a time consuming hindrance. In this article a process for the efficient design of 
humanoid robot systems is presented. The goal of this process is to minimize the 
development time for new humanoid robot platforms by including the experience and 
knowledge gained in the development of humanoid robot components in the 
collaborative research centre 588. 

Weight and stiffness of robot components have a significant influence on energy 
efficiency, operating time, safety for users and the dynamic behaviour of the system in 
general. The finite element based method of topology optimization gives designers the 
possibility to develop structural components efficiently according to specified loads and 
boundary conditions without having to rely on coarse calculations, experience or 
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intuition. The design of the central support structure of the upper body of the humanoid 
robot ARMAR III is an example for how topology optimization can be applied in 
humanoid robotics. Finally the design of the upper body of the humanoid ARMAR III is 
presented in detail. 

2. Demand for efficient design of humanoid robots 

Industrial robots are being used in many manufacturing plants all over the world. This 
product class has reached a high level of maturity and a broad variety of robots for special 
applications is available from different manufacturers. Even though both kind of robots, 
industrial and humanoid, manipulate objects and the same types of components, e.g. 
harmonic drive gears, can be found in both types, the target systems differ significantly. 
Industrial robots operate in secluded environments strictly separated from humans. They 
perform a limited number of clearly defined repetitive tasks. These machines and the tools 
they use are often designed for a special purpose. High accuracy, high payload, high 
velocities and stiffness are typical development goals. 

Humanoid robots work together in a shared space with humans. They are designed as 
universal helpers and should be able to learn new skills and to apply them to new, 
previously unknown tasks. Humanlike kinematics allows the robot to act in an 
environment originally designed for humans and to use the same tools as humans in a 
similar way. Human appearance, behaviour and motions which are familiar to the user 
from interaction with peers make humanoid robots more predictable and increase their 
acceptance. Safety for the user is a critical requirement. Besides energy efficient drive 
technology, a lightweight design is important not only for the mobility of the system but 
also for the safety of the user as a heavy robot arm will probably cause more harm in case 
of an accident than a light and more compliant one. Due to these significant differences, 
much of the development knowledge and product knowledge from industrial robots 
cannot be applied to humanoid robots. 

The multi-modal interaction between a humanoid robot and its environment, the 
human users and eventually other humanoids cannot fully be simulated in its entire 
complexity. To investigate these coherences, actual humanoid robots and experiments 
are needed. Currently only toy robots and a few research platforms are commercially 
available, often at high cost. Most humanoid robots are designed and built according to 
the special focus or goals of a particular research project and many more will be built 
before mature and standardized robots will be available in larger numbers at lower 
prizes. Knowledge gained from the development of industrial robots that have been 
used in industrial production applications for decades cannot simply be reused in the 
design of humanoid robots due to significant differences in the target systems for both 
product classes. A few humanoid robots have been developed by companies, but not 
much is known about their design process and seldom is there any information 
available that can be used for increasing the time and cost efficiency in the development 
of new improved humanoid robots. Designing a humanoid robot is a long and iterative 
process as there are various interactions between e.g. mechanical parts and the control 
system. The goal of this article is to help shortening the development time and to 
reduce the number of iterations by presenting a process for efficient design, a method 
for optimizing light yet stiff support structures and presenting the design of the upper 
body of the humanoid robot ARMAR III. 
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3. Design process for humanoid robot modules 

The final goal of the development of humanoid robots is to reproduce the capabilities of a 
human being in a technical system. Even though several humanoid robots already exist and 
significant effort is put into this research field, we are still very far from reaching this goal. 
Humanoid robots are complex systems which are characterized by high functional and 
spatial integration. The design of such systems is a challenge for designers which cannot yet 
be satisfactorily solved and which is often a long and iterative process. Mechatronic systems 
like humanoid robots feature multi-technological interactions, which are displayed by the 
existing development processes, e.g. in the VDI guideline 2206 "design methodology for 
mechatronics systems" (VDI 2004), in a rather general and therefore abstract way. More 
specific development processes help to increase the efficiency of the system development. 
Humanoid robots are a good example for complex and highly integrated systems with 
spatial and functional interconnections between components and assembly groups. They are 
multi-body systems in which mechanical, electronic, and information-technological 
components are integrated into a small design space and designed to interact with each 
other. 

3.1 Requirements 

The demands result from the actions that the humanoid robot is supposed to perform. The 
robot designed in the SFB 588 will interact with humans in their homes, especially in the 
kitchen. It will take over tasks from humans, for example loading a dish washer. For this 
task it is not necessary, that the robot can walk on two legs, but it has to feature kinematics, 
especially in the arms, that enable it to reach for objects in the human surrounding. In 
addition, the robot needs the ability to move and to hold objects in its hand (Schulz, 2003). 

3.2 Subdivision of the total system 

The development of complex systems requires a subdivision of the total system into 
manageable partial systems and modules (Fig. 1). The segmentation of the total system of 
the humanoid robot is oriented on the interactions present in a system. The total system can 
be divided into several subsystems. The relations inside the subsystems are stronger 
compared to the interactions between these subsystems. One partial system of the 
humanoid robot is e.g. the upper body with the subsystem arm. The elements in the lowest 
level in the hierarchy of subsystems are here referred to as modules. In the humanoid 
robot's arm, these modules are hand-, elbow-, and shoulder joint. Under consideration of 
the remaining design, these modules can be exchanged with other modules that fulfil the 
same function. The modules again consist of function units, as e.g. the actuators for one of 
the module's joints. The function units themselves consist of components, here regarded as 
the smallest elements. In the entire drive, these components are the actuator providing the 
drive power and the components in the drive train connected in a serial arrangement, e.g. 
gears, drive belt, or worm gear transferring the drive power to the joint. 

3.3 Selection and data base 

Many components used in such highly integrated systems are commonly known, 
commercially available and do not have to be newly invented. However, a humanoid robot 
consists of a large number of components, and for each of them there may be a variety of 
technical solutions. This leads to an overwhelming number of possible combinations, which 
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cannot easily be overseen without help and which complicates an efficient target-oriented 
development. Therefore it is helpful to file the components of the joints, actuators and 
sensors as objects in an object-oriented classification. It enables a requirement-specific access 
to the objects and delivers information about possible combinations of components. 



Total System 



Subsystem 
Arm 



Function Uivit 
Drive 




Component 
Actuator 



Wrist 
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Fig. 1. Subdivision of the total system. 



3.4 Development sequence 

The development sequence conforms to the order in which a component or information has 
to be provided for the further procedure. The development process can be roughly divided 
into two main sections. The first section determines the basic requirements for the total 
system, which have to be known before the design process. This phase includes primarily 
two iterations: In the first iteration, the kinematics of the robot is specified according to the 
motion space of the robot and the kinematics again has to be describable in order to be 
controllable. In the second iteration, the control concept for the robot and the general 
possibilities for operating the joints are adjusted to the requirements for the desired 
dynamics of the robots. The second sector is the actual design process. The sequence in 
which the modules are developed is determined by their position in the serial kinematics of 
the robot. This means that e.g. in the arm, first the wrist, the elbow joint and then finally the 
shoulder joint are designed. 

Since generally all modules have a similar design structure, they can be designed 
according to the same procedure. The sequence in this procedure model is determined by 
the interactions between the function units and between the components. The relation 
between the components and the behaviour of their interaction in case of a change of the 
development order can be displayed graphically in a design structure matrix (Browning, 
2001). Iterations, which always occur in the development of complex systems, can be 
limited by early considering the properties of the components that are integrated at the 
end of the development process. One example is the torque measurement in the drive 
train. In the aforementioned data base, specifications of the components are given like the 
possibility for a component of the drive train to include some kind of torque 
measurement. It ensures that after the assembly of a drive train, a power measurement 
can be integrated. 
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3.5 Development of a shoulder joint 

The development of a robot shoulder joint according to this approach is exemplarily 
described in the following paragraphs. For the tasks that are required from the robot, it is 
sufficient if the robot is able to move the arm in front of its body. These movements can be 
performed by means of a ball joint in the shoulder without an additional pectoral girdle. In 
the available design space, a ball joint can be modelled with the required performance of the 
actuators and sensors as a serial connection of three single joints. The axes of rotation of 
these joints intersect at one point. A replacement joint is used which consists of a roll joint, a 
pitch joint, and then again of another roll joint. The description of the kinematics can only be 
clarified together with the entire arm, which requires limiting measures, especially if 
redundant degrees of freedom exist (Asfour, 2003). 

Information about the mass of the arm and its distribution are requirements for the design 
of the shoulder joint module. In addition, information about the connection of elbow and 
shoulder has to be available. This includes the components that are led from the elbow to or 
through the shoulder, as e.g. cables or drive trains of lower joints. The entire mechatronic 
system can be described in an abstract way by the object-oriented means of SysML (System 
Modelling Language) (SysML, 2005) diagrams, with which it is possible to perform a system 
test with regard to compatibility and operational reliability. It enables the representation of 
complex systems at different abstraction levels. Components that are combined in this way 
can be accessed in the aforementioned classification, which facilitates a quick selection of the 
components that can be used for the system. In addition, it makes a function design model 
possible at every point of the development. 

Kinematic* JomlH wilh IW P ect to 

drive* 



y 




Shoulder module Dines will) respect to 

^t sensors ^^^^ 

Fig. 2. Design of the shoulder module. 

In the development of the shoulder module (Fig. 2), at first the function units of the joints for 
the three rotating axes are selected according to the kinematics. Then, the function unit 
drive, including the actuators and the drive trains, are integrated. Hereafter, the sensors are 
selected and integrated. In order to prevent time consuming iterations in the development, 
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the components of the total system, integrated at a later stage, are already considered from 
the start with regard to their general requirements for being integrated. Examples for this 
are the sensors, which can then be assembled without problems since it is made sure that the 
already designed system offers the possibility to integrate them. During the next step the 
neighbouring module is designed. Information about the required interface of the shoulder 
and the mass of the arm and its distribution are given to the torso module. 

4. Topology optimization 

Topology optimization is used for the determination of the basic layout of a new design. It 
involves the determination of features such as the number, location and shape of holes and 
the connectivity of the domain. A new design is determined based upon the design space 
available, the loads, materials and other geometric constraints, e.g. bearing seats of which 
the component is to be composed of. 

Today topology optimization is very well theoretically studied (Bendsoe & Sigmund, 2003) 
and applied in industrial design processes (Pedersen & Allinger, 2005). The designs 
obtained using topology optimization are considered design proposals. These topology 
optimized designs can often be rather different compared to designs obtained with a trial 
and error design process or designs obtained upon improvements based on experience or 
intuition as can be deduced from the motor carrier example in Fig. 3. Especially for complex 
loads, which are typical for systems like humanoid robots, these methods of structural 
optimization are helpful within the design process. 






Design space for topology optimization Constructional implementation 

Fig. 3. Topology optimization of a gear oil line bracket provided by BMW Motoren GmbH. 

The standard formulation in topology optimization is often to minimize the compliance 
corresponding to maximizing the stiffness using a mass constraint for a given amount of 
material. Compliance optimization is based upon static structural analyses, modal 
analyses or even non-linear problems e.g. models including contacts. A topology 
optimization scheme as depicted in Fig. 4. is basically an iterative process that integrates a 
finite element solver and an optimization module. Based on a design response supplied 
by the FE solver like strain energy for example, the topology optimization module 
modifies the FE model. 

The FE model is typically used together with a set of loads that are applied to the model. 
These loads do not change during the optimization iterations. An MBS extended scheme as 
introduced by (Haussler et al., 2001) can be employed to take the dynamic interaction 
between the FE model and the MBS system into account. 
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Fig. 4. Topology optimization scheme. 

4.1 Topology optimization of robot thorax 

The design of the central support structure of the upper body, the thorax, of the humanoid 
robot ARMAR III was determined with the help of topology optimization. The main 
functions of this element are the transmission of forces between arms, neck and torso joint 
and the integration of mechanical and electrical components, which must be accommodated 
for inside the robot's upper body. For instance four drive units for the elbows have to be 
integrated in the thorax to reduce the weight of the arms, electrical components like two PC- 
104s, four Universal Controller Modules (UCoM), A/D converters, DC/ DC converters and 
force-moment controllers. 




Fig. 5. Topology optimization of the thorax. 
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The left picture in figure 5 shows the initial FE model of the available design space including 
the geometric boundary conditions like the mechanical interfaces for the adjoining modules 
neck, arms and torso joint as well as the space reserved for important components like 
computers and controllers. Together with a set of static loads, this was the input for the 
optimization process. The bottom left picture shows the design as it was suggested by the 
optimization module after the final optimization loop. This design was then manually 
transferred into a 3d model in consideration of manufacturing restrictions. The picture on 
the right in Fig. 5 shows the assembled support structure made from high-strength 
aluminium plates. The result of the optimization is a stiff and lightweight structure with a 
total mass of 2.7 kg. 



5. The upper body of ARMAR III 

ARMAR III is s a full-size humanoid Robot which is the current demonstrator system of the 
collaborative research centre 588. It consists of a sensor head for visual and auditory 
perception of the environment, an upper body with two arms with a large range of motion for 
the manipulation of objects and a holonomic platform for omni-directional locomotion. 
ARMAR III has a modular design consisting of the following modules: head, neck joint, 
thorax, torso joint and two arms which are subdivided into shoulder, elbow, wrist and hands. 
The head and the holonomic platform were developed at the Research Center for Information 
Technologies (FZI), the hands were developed at the Institute for Applied Computer Science at 
the Forschungszentrum Karlsruhe (Beck et al, 2003; Schulz 2003). The modules for neck, torso 
and arms shown in the following figure were designed and manufactured at the Institute of 
Product Development (IPEK) at the University of Karlsruhe (TH). 




Fig. 6. The upper body of the humanoid robot ARMAR III. 
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Fig. 7. Kinematics and CAD model of upper body of ARMAR III. 

The size of the design space and the motion space of ARMAR III are similar to that of a 
human person with a height of approximately 175 cm. The main dimensions of the upper 
body can be seen in Fig. 8. 

Table 1 gives an overview of the degrees of freedom and the motion range of all modules. 
Both arms have seven degrees of freedom. The three degrees of freedom in the shoulder 
provide a relatively wide range of motion. Together with two degrees of freedom in the 
elbow as well as in the wrist, the arm can be used for complex manipulation tasks that 
occur in the primary working environment of ARMAR III, the kitchen. Compared with 
other humanoid robots, the arm of ARMAR III provides large and humanlike ranges of 
motion. The neck joint with four degrees of freedom allows humanlike motion of the 
head. 
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Fig. 8. Dimension of upper body. 
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Degree of 
freedom 


Part 


D.O.F 


amount 


total 


Wrist 

Elbow 

Shoulder 

Neck 

Torso 


2 
2 
3 
4 
3 


2 
2 
2 
1 
1 


4 
4 
6 
4 
3 


Upper body 21 


Range of 
motion 


Wrist 


9i 

e 2 


-30° to 30° 
-60° to 60° 


Elbow 


e 3 
e 4 


-90° to 90° 
-10° to 150° 


Shoulder 


e 5 
e 6 
e 7 


-180° to 180° 
-45° to 180° 
-10° to 180° 


Neck 


e 8 

010 

On 


-180° to 180° 
-45° to 45° 
-45° to 45° 
-60° to 60° 


Torso 


012 
013 
014 


-180° to 180° 
-10° to 60° 
-20° to 20° 



Table 1. Degrees of freedom with range of motion. 



5.1 Shoulder joint 

The shoulder joint is the link between the arm and the torso. In addition to the realization of 
three degrees of freedom with intersecting axes in one point, the bowden cables for driving 
the elbow joint are guided through the shoulder joint from the elbow drive units in the torso 
to the elbow. The drive units of all joints are designed in a way, that their contributions to 
the inertia are as small as possible. Therefore the drive unit for panning the arm (Rot. 1), 
which has to provide the highest torque in the arm, is attached directly to the torso and does 
not contribute to the inertia of the moving part of the arm. The drive units for raising the 
arm (Rot. 2) and turning the arm around its longitudinal axis (Rot. 3) have been placed 
closely to the rotational axes to improve the dynamics of the shoulder joint. In order to 
achieve the required gear ratios in the very limited design space, Harmonic Drive 
transmissions, worm gear transmissions and toothed belt transmissions have been used. 
These elements allow a compact design of the shoulder with a size similar to a human 
shoulder. As all degrees of freedom are realized directly in the shoulder, the design of the 
upper arm is slender. The integration of torque sensors in all three degrees of freedom is 
realized in two different ways. For the first degree of freedom strain gages are attached to a 
torsion shaft which is integrated in the drive train. The torque for raising and turning the 
arm is determined by force sensors that measure the axial forces in the worm gear shafts. In 
addition to the encoders, which are attached directly at the motors, angular sensors for all 
three degrees of freedom are integrated into the drive trains of the shoulder joints. The 
position sensors, which are located directly at the joints, allow quasi-absolute angular 
position measurement based on incremental optical sensors. A touch-sensitive artificial skin 
sensor, which can be used for collision detection or intuitive tactile communication, is 
attached to the front and rear part of the shoulder casing (Kerpa et al., 2003). 
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Fig. 9. Side view of the shoulder joint. 



5.2 Elbow joint and upper arm 

The elbow joint of ARMAR III has two degrees of freedom. These allow bending as well as 
rotating the forearm. The drive units, consisting of motor and Harmonic Drive transmissions, 
are not in the arm, but are located in the thorax of the robot. Thus the moving mass of the arm 
as well as the necessary design space are reduced, which leads to better dynamic 
characteristics and a slim form of the arm. The additional mass in the thorax contributes 
substantially less to the mass inertia compared to placing the drive units in the arm. 
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Fig. 10. Elbow joint. 
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Due to this concept, load transmission is implemented with the use of wire ropes, which are 
led from the torso through the shoulder to the elbow by rolls and bowden cables. In order to 
realize independent control of both degrees of freedom, the wire ropes for turning the 
forearm are led through the axis of rotation for bending the elbow. With altogether twelve 
rolls, this rope guide realizes the uncoupling of the motion of bending the elbow from 
rotating the forearm. In contrast to the previous version of the elbow, where the steel cables 
were guided by Bowden cables, this solution leads to smaller and constant friction losses 
which is advantageous for the control of this system. 

Similar to the shoulder, the angular measurement is realized by encoders attached directly 
to the motors as well as optical sensors that are located directly at the joint for both degrees 
of freedom. In order to measure the drive torque, load cells are integrated in the wire ropes 
in the upper arm. As each degree of freedom in the elbow is driven by two wire ropes the 
measuring of force in the wire ropes can be done by differential measurements. Another 
possibility for measuring forces offers the tactile sensor skin, which is integrated in the 
cylindrical casing of the upper arm. 

By placing the drive units in the thorax, there is sufficient design space left in the arm which 
can be used for electronic components that process sensor signals and which can be installed 
in direct proximity to the sensors in the upper arm. 



5.3 Wrist joint and forearm 

The wrist has two rotational degrees of freedom with both axes intersecting in one point. 
ARMAR III has the ability to move the wrist to the side as well as up and down. This was 
realized by a universal joint in very compact design. The lower arm is covered by a 
cylindrical casing with an outer diameter of 90 mm. The motors for both degrees of freedom 
are fixed at the support structure of the forearm. The gear ratio is obtained by a ball screw 
and a toothed belt or a wire rope respectively. The load transmission is almost free from 
backlash. 
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Fig. 11. Forearm with two degrees of freedom in the wrist. 
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By arranging the motors close to the elbow joint, the centre of mass of the forearm is 
shifted towards the body, which is an advantage for movements of the robot arm. 
Angular measurement in the wrist is realized by encoders at the motors and with 
quasi-absolute angular sensors directly at the joint. To measure the load on the hand, a 
6-axis force and torque sensor is fitted between the wrist and the hand (Beck et al., 
2003) (not shown in Fig. 11). The casing of the forearm is also equipped with a tactile 
sensor skin. The support structure of the forearm consists of a square aluminium 
profile. This rigid lightweight structure offers the possibility of cable routing on the 
inside and enough space for mounting electronic components on the flat surfaces of 
the exterior. 



5.4 Neck joint 

The complex kinematics of the human neck is defined by seven cervical vertebrae. Each 
connection between two vertebrae can be seen as a joint with three degrees of freedom. For 
this robot, the kinematics of the neck has been reduced to a serial kinematics with four 
rotational degrees of freedom. Three degrees of freedom were realized in the basis at the 
lower end of the neck. Two degrees of freedom allow the neck to lean forwards and 
backwards (1) and to both sides (2), another degree of freedom allows rotation around the 
longitudinal axis of the neck. At the upper end of the neck, a fourth degree of freedom 
allows nodding of the head. This degree of freedom allows more human-like movements of 
the head and improves the robots ability to look up and down and to detect objects directly 
in front of it. 




Fig. 12. Neck joint with four degrees of freedom. 
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For the conversion of torque and rotational speed, the drive train of each degree of 
freedom consists of Harmonic Drive transmissions either as only transmission element 
or, depending on the needed overall gear ratio, in combination with a toothed gear 
belt. 

The drives for all degrees of freedom in the neck are practically free from backlash. The 
motors of all degrees of freedoms are placed as close as possible to the rotational axis in 
order to keep the moment of inertia small. The sensors for the angular position 
measurement in the neck consist of a combination of incremental encoders, which are 
attached directly to the motors, and quasi-absolute optical sensors, which are placed directly 
at the rotational axis. The neck module as depicted above weighs 1.6 kg. 

5.5 Torso joint 

The torso of the upper body of ARMAR III is divided into two parts, the thorax and the 
torso joint below it. The torso joint allows motion between the remaining upper body and 
the holonomic platform, similar to the functionality of the lower back and the hip joints in 
the human body. The kinematics of the torso joint does not exactly replicate the complex 
human kinematics of the hip joints and the lower back. The complexity was reduced in 
consideration of the functional requirements which result from the main application 
scenario of this robot in the kitchen. The torso joint has three rotational degrees of 
freedom with the axes intersecting in one point. The kinematics of this joint, as it is 
described in table 1 and Fig. 13, is sufficient to allow the robot to easily reach important 
points in the kitchen. For example in a narrow kitchen, the whole upper body can turn 
sideways or fully around without having to turn the platform. One special requirement 
for the torso joint is, that all cables for the electrical energy flow and information flow 
between the platform and the upper body need to go through the torso joint. All cables 
are to be led from the upper body to the torso joint in a hollow shaft with an inner bore 
diameter of 40 mm through the point of intersection of the three rotational axes. This 
significantly complicates the design of the joint, but the cable connections can be shorter 
and stresses on the cables due to compensational motions, that would be necessary if the 
cable routing was different, can be reduced. This simplifies the design of the interface 
between upper and lower body. For transportation of the robot, the upper and lower part 
of the body can be separated by loosening one bolted connection and unplugging a few 
central cable connections. Due to the special boundary conditions from the cable routing, 
all motors had to be placed away from the point of intersection of the three axes and the 
motor for the vertical degree of freedom Rot. 3 could not be positioned coaxially to the 
axis of rotation. The drive train for the degrees of freedom Rot. 1 and Rot. 3 consists of 
Harmonic Drive transmissions and toothed belt transmissions. 

The drive train for the degree of freedom Rot. 2 is different from most of the other drive 
trains in ARMAR III as it consists of a toothed belt transmission, a ball screw and a 
piston rod which transforms the translational motion of the ball screw into the 
rotational motion for moving the upper body sideways. This solution is suitable for the 
range of motion of 40°, it allows for a high gear ratio and the motor can be placed away 
from the driven axis and away from the point of intersection of the rotational axes. 
In addition to the encoders, which are directly attached to the motors, two precision 
potentiometers and one quasi-absolute optical sensor are used for the angular position 
measurement. 
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Fig. 13. Torso joint. 



6. Conclusions and future work 



Methods for the efficient development of modules for a humanoid robot were developed. 
Future work will be to create a database of system elements for humanoid robot components 
and the characterization for easier configuration of future humanoids. This database can then 
be used to generate consistent principle solutions for robot components more efficiently. 
Topology optimization is a tool for designing and optimizing robot components which need to 
be light yet stiff. The thorax of ARMAR III was designed with the help of this method. For the 
simulation of mechatronic systems like humanoid robots, it is necessary to consider 
mechanical aspects as well as the behaviour of the control system. This is not yet realized in 
the previously described topology optimization process. The coupling between the mechanical 
system and the control system might influence the overall system's dynamic behaviour 
significantly. As a consequence, loads that act on a body in the system might be affected not 
only by the geometric changes due to optimization but also by the control system as well. The 
topology optimization scheme shown in Fig. 4 should be extended by means of integrating the 
dynamic system with a multi body simulation and the control system as depicted in Fig. 14. 
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Fig. 14. Controlled MBS extended topology optimization. 

The upper body of the humanoid robot ARMAR III was presented. The modules for neck, 
arms and torso were explained in detail. The main goals for the future work on ARMAR III 
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are to further reduce the weight and to increase the energy efficiency, increase the payload 
and to design a closed casing for all robot joints. 
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1. Introduction 

Studies on biped robots have attracted interest due to such problems as inherent poor 
stability and the cooperation of a large degree of freedom. Furthermore, recent advanced 
technology, including hardware and software, allows these problems to be tackled, 
accelerating the interest. Actually, many sophisticated biped robots have already been 
developed that have successfully achieved such various motions as straight walking, 
turning, climbing slopes, rising motion, and running (Aoi & Tsuchiya, 2005; Aoi et al., 2004; 
Hirai et al, 1998; Kuniyoshi et al., 2004; Kuroki et al. 2003; Loffler et al, 2003; Nagasaki et al., 
2004). 

Steady gait for a biped robot implies a stable limit cycle in its state space. Therefore, 
different steady gait patterns have different limit cycles, and gait transition indicates that the 
state of the robot moves from one limit cycle to another. Even if the robot obtains steady gait 
patterns, their transition is not necessarily confirmed as completed. Thus, smooth transition 
between gait patterns remains difficult. To overcome such difficulty, many studies have 
concentrated on model-based approaches using inverse kinematics and kinetics. These 
approaches basically generate robot motions based on such criteria as zero moment point 
(Vukobratovic et al., 1990) and manipulate robot joints using motors. However, they require 
accurate modeling of both the robot and the environment as well as complicated 
computations. The difficulty of achieving adaptability to various environments in the real 
world is often pointed out, which means that in these approaches the robot is too rigid to 
react appropriately to environmental changes. Therefore, the key issue in the control is to 
establish a soft robot by adequately changing the structure and response based on 
environmental changes. 

In contrast to robots, millions of animal species adapt themselves to various environments 
by cooperatively manipulating their complicated and redundant musculoskeletal systems. 
Many studies have elucidated the mechanisms in their motion generation and control. In 
particular, neurophysiological studies have revealed that muscle tone control plays 
important roles in generating adaptive motions (Mori, 1987; Rossignol, 1996; Takakusaki et 
al., 2003), suggesting the importance of compliance in walking. Actually, many studies on 
robotics have demonstrated the essential roles of compliance. Specifically, by appropriately 
employing the mechanical compliance of robots, simple control systems have attained 
highly adaptive and robust motions, especially in hexapod (Altendorfer et al., 2001; Cham et 
al, 2004; Quinn et al, 2003; Saranli et al, 2001), quadruped (Fukuoka et al., 2003; Poulakakis 
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et al., 2005), and biped robots (Takuma & Hosoda, 2006; Wisse et al., 2005). However, note 
that control systems using motors continue to have difficulty adequately manipulating 
compliance in robot joints. 

On the other hand, neurophysiological studies have also clarified that animal walking is 
generated by central pattern generators (CPGs) that generate rhythmic signals to activate 
their limbs (Grillner, 1981, 1985; Orlovsky et al., 1999). CPGs modulate signal generation in 
response to sensory signals, resulting in adaptive motions. CPGs are widely modeled using 
nonlinear oscillators (Taga et al., 1991; Taga, 1995a,b), and based on such CPG models many 
walking robots and their control systems have been developed, in particular, for quadruped 
robots (Fukuoka et al., 2003; Lewis & Bekey, 2002; Tsujita et al., 2001), multi-legged robots 
(Akimoto et al., 1999; Inagaki et al., 2003), snake-like robots (Ijspeert et al., 2005; Inoue et al., 
2004), and biped robots (Aoi & Tsuchiya, 2005; Aoi et al., 2004; Lewis et al., 2003; Nakanishi 
et al., 2004). 

This paper deals with the transition from quadrupedal to bipedal locomotion of a biped 
robot while walking. These gait patterns originally have poor stability, and the transition 
requires drastic changes in robot posture, which aggravates the difficulty of establishing the 
transition without falling over. Our previous work developed a simple control system using 
nonlinear oscillators by focusing on CPG characteristics that are used for both quadruped 
and biped robots, revealing that they achieved steady and robust walking verified by 
numerical simulations and hardware experiments (Aoi & Tsuchiya, 2005; Aoi et al., 2004; 
Tsujita et al., 2001). In this paper, we use the same developed control system for both 
quadrupedal and bipedal locomotion of a biped robot and attempt to establish smooth gait 
transition. Specifically, we achieve stable limit cycles of these gait patterns and their 
transitions by moving the robot state from one limit cycle to another by cooperatively 
manipulating their physical kinematics through numerical simulations. This paper is 
organized as follows. Section 2 introduces the biped robot model considered in this paper. 
Section 3 explains the developed locomotion control system, and Section 4 addresses the 
approach of gait transition and numerical results. Section 5 describes the discussion and 
conclusion. 

2. Biped robot model 

Figure 1(a) shows the biped robot model considered in this paper. It consists of a trunk, a 
pair of arms composed of four links, and a pair of legs composed of six links. Each link is 
connected to the others through a single degree of freedom rotational joint. A motor is 
installed at each joint. Four touch sensors are attached to the sole of each foot, and one touch 
sensor is attached to the tip of the hand of each arm. The left and right legs are numbered 
Legs 1 and 2, respectively. The joints of the legs are also numbered Joints 1...6 from the side 
of the trunk, where Joints 1, 2, and 3 are yaw, roll, and pitch hip joints, respectively. Joint 4 is 
a pitch knee joint, and Joints 5 and 6 are pitch and roll ankle joints. The arms are also 
numbered in a similar manner. Joints 1 and 4 are pitch joints, Joint 2 is a roll joint, and Joint 
3 is a yaw joint. To describe the configuration of the robot, we introduce angles Q< and 0' 

(i-1,2, y=l,..,,4, fc=l,...,6), which are rotation angles of Joint j of Arm i and Joint k of Leg i, 
respectively. The robot walks quadrupedally and bipedally, as shown in Figs. 1(b) and (c). 
Its physical parameters are shown in Table 1. The ground is modeled as a spring with a 
damper in numerical simulations. 
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(a) Biped robot (b) Quadrupedal locomotion 

Fig. 1. Schematic model of a biped robot [mm]. 



(c) Bipedal locomotion 



Link 


Mass [kg] 


Length [m] 


Trunk 


2.34 


0.20 


Leg 


1.32 


0.28 


Arm 


0.43 


0.25 


Total 


5.84 


0.48 



Table 1. Physical parameters of robot. 



3. Locomotion control system 

3.1 Concept of the control system 

As described above, the crucial issue in controlling a biped robot is establishing a 
mechanism in which the robot adapts itself by changing its internal structure based on 
interactions between the robot's mechanical system and the environment. 
Neurophysiological studies have revealed that animal walking is generated by CPGs 
comprised of a set of neural oscillators present in the spinal cord. CPGs characteristically 
have the following properties: 

1. CPGs generate inherent rhythmic signals that activate their limbs to generate 
rhythmic motions; 

2. CPGs are sensitive to sensory signals from peripheral nerves and modulate signal 
generation in response to them. 

Animals can immediately adapt to environmental changes and disturbances by virtue of 
these features and achieve robust walking. 

We have designed a locomotion control system that has an internal structure that adapts to 
environmental changes, referring to CPG characteristics. In particular, we employed 
nonlinear oscillators as internal states that generate inherent rhythmic signals and 
adequately respond to sensory signals. Since the motor control of a biped robot generally 
uses local high-gain feedback control to manipulate the robot joints, we generated nominal 
joint motions using rhythmic signals from the oscillators. One of the most important factors 
in the dynamics of walking is the interaction between the robot and the external world, that 
is, dynamical interaction between the robot feet and the ground. The leg motion consists of 
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swing and stance phases, and a harmonious balance must be achieved between these 
kinematical motions and dynamical interaction, which means that it is essential to 
adequately switch from one phase to another. Therefore, our developed control system 
focused on this point. Specifically, it modulated the signal generation of the oscillators and 
appropriately changed the leg motions from the swing to the stance phase based on touch 
sensors. Although we concisely describe the developed control system below, see our 
previous work (Aoi & Tsuchiya, 2005) for further details. 



3.2 Developed locomotion control system 

The locomotion control system consists of a motion generator and controller (see Fig. 2(a)). 
The former is composed of rhythm and trajectory generators. The rhythm generator has two 
types of oscillators: Motion and Inter (see Fig. 2(b)). As Motion oscillators, there are Leg 1, 
Leg 2, Arm 1, Arm 2, and Trunk oscillators. The oscillators follow phase dynamics in which 
they have interactions between themselves and receive sensory signals from touch sensors. 
The trajectory generator creates nominal trajectories of robot joints by phases of Motion 
oscillators, which means that it generates physical kinematics of the robot based on 
rhythmic signals from the oscillators. It receives outer commands and changes the physical 
kinematics to reflect the outer commands. The nominal trajectories are sent to the motion 
controller in which motor controllers manipulate the joint motions using the nominal 
trajectories as command signals. Note that physical kinematics is different between 
quadrupedal and bipedal locomotion, and except for the kinematics, throughout this paper 
we use the same control system regardless of gait patterns. 
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(a) Architecture of the control system 
Fig. 2. Locomotion control system. 



(b) Rhythm fiener<io(r and oscillators 



3.2. 1 Trajectory generator 

As mentioned above, the trajectory generator creates nominal trajectories of all joints based 
on the phases of the Motion oscillators. First, let ml , m' , q> T / ar| d m (' = L2) be the phases of 

Leg (', Arm i, Trunk, and Inter oscillators, respectively. 

The nominal trajectories of the leg joints are determined by designing the nominal trajectory 

of the foot, specifically Joint 5, relative to the trunk in the pitch plane. The nominal foot 
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trajectory consists of swing and stance phases (see Fig. 3). The former is composed of a 
simple closed curve that includes anterior extreme position (AEP) and posterior extreme 
position (PEP). This trajectory starts from point PEP and continues until the leg touches the 
ground. On the other hand, the latter consists of a straight line from the foot landing 
position (LP) to point PEP. Therefore, this trajectory depends on the timing of foot contact 
with the ground in each step cycle. Both in the swing and stance phases, nominal foot 
movement is designed to be parallel to the line that involves points AEP and PEP. The 
height and forward bias from the center of points AEP and PEP to Joint 3 of the leg are 
defined as parameters A, and H, , respectively. These two nominal foot trajectories provide 

nominal trajectories 0' (t=l ,2,/=3,4,5) of Joint j (hip, knee, and ankle pitch joints) of Leg i by 

the functions of phase ml of Leg i oscillator written by §' (ml ), where we use ml = at 

point PEP and m' =m at point AEP. Note that nominal stride S is given by the distance 

between points AEP and PEP, and duty factor R is given by the ratio between the nominal 

stance phase and step cycle durations. 

A,. Ar 





PEP 



LP • 

Swing phase Stance phase 

Fig. 3. Nominal foot trajectory. 

The nominal trajectories of the arm joints are generated in a similar way to the leg joints 
described above except for the bend direction between Joint 4 of the arm and Joint 4 of the 
leg (see Fig. 4 below). Similar to the foot trajectory, the nominal trajectory of the hand, 
specifically the touch sensor at the tip of the arm, is designed relative to the trunk in the 
pitch plane, which consists of the swing and stance phases. Then, also from inverse 
kinematics, nominal trajectories 0' (i=l,2, j=l,...,4) of Joint j of Arm i are given by the 

functions of phase m ' of Arm i oscillator. The nominal trajectories of the arm joints also 

have parameters A. and H, , similar to those of the leg joints, that use the same nominal 

stride S and duty ratio Q as leg motions. 



3.2.2 Rhythm generator and sensory signals 

In the rhythm generator, Motion and Inter oscillators generate rhythmic behavior based on 
the following phase dynamics: 



?i= a + gu 




f T =a + g lT 




<p , A=^ + g t lA + g2A 


i = 1,2 


<P'L=V+g'lL+g2L 


i = 1,2 
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(1) 



where p , a , n' , and q> (i=l,2) are functions regarding the nominal phase 
relationship shown below, a' and n' (t=l,2) are functions arising from sensory 
signals given below, and a is the nominal angular velocity of each oscillator obtained 

by 

w = 2k^- (2) 

T 

± sw 

where T is the nominal swing phase duration. 

To establish stable walking, the essential problem is the coordination of joint motions. 
Interlimb coordination is the key. For example, both legs must move out of phase to 
prevent the robot from falling over during bipedal locomotion. Since the nominal 
joint trajectories for the limbs are designed by oscillator phases, interlimb 
coordination is given by the phase relation, that is, the phase differences between 
oscillators. Functions o , o , a' , and »' in Eq. (1), which deal with interlimb 

coordination, are given by the phase differences between oscillators based on Inter 
oscillator, written by 

gn = "ZL K l sm(<p, -<p[ + (-lfn/2) 

g 1T =-K T sin(cp T -(p I ) 

g'iA=-K A sm(cp , A -(p l +(-\y ji/2) i=l,2 

g[ L =-K L sm( ( p[- (Pi -(-\yji/2) i = \,2 
where nominal phase relations are given so that both the arms and legs move out of phase 
and one arm and the contralateral leg move in phase and K, , K, , and K T are gain 
constants. 

In addition to physical kinematics and interlimb coordination, the modulation of walking 
rhythm is another important factor to generate walking. Functions <?• and n' modulate 

walking rhythm through the modulation of the phases of oscillators based on sensory 
signals. Specifically, when the hand of Arm i (the foot of Leg (') lands on the ground, Arm i 
oscillator (Leg i oscillator) receives a sensory signal from the touch sensor (i=l,2). Instantly, 
phase m' of Arm i oscillator (phase m' of Leg i oscillator) is reset to nominal value m 

from value m' at the landing. Therefore, functions &' and o' are written by 

g'2A=(9AE P -9L d )8(t-tl nd ) 1=1,2 

82L=(9ABp-<PLi)8(t-tLa) i = l > 2 

where m AFP = 2ji(\ — B), V is the time when the hand of Arm i (the foot of Leg (') lands on 
the ground (i = l,2) and SO) denotes Dirac's delta function. Note that touch sensor signals not 
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only modulate the walking rhythm but also switch the leg motions from the swing to stance 
phase, as described in Sec. 3.2.1. 

4. Gait transition 

This paper aims to achieve gait transition from quadrupedal to bipedal locomotion of a 
biped robot. As mentioned above, even if the robot establishes steady quadrupedal and 
bipedal locomotion, that is, each locomotion has a stable limit cycle in the state space of the 
robot, there is no guarantee that the robot can accomplish a transition from one limit cycle of 
quadrupedal locomotion to another of bipedal locomotion without falling over. 
Furthermore, these gait patterns originally have poor stability, and the transition requires 
drastic changes in robot posture, which aggravates the difficulty of preventing the robot 
from falling over during the transition. 

4.1 Gait transition control 

A biped robot generates quadrupedal and bipedal locomotion while manipulating many 
degrees of freedom in the joints. These gait patterns have different movements in many 
joints, which means that there are a million ways to achieve the transition. Therefore, the 
critical issue is designing gait transition. 

In this paper, we generate both quadrupedal and bipedal locomotion of the robot based on 
physical kinematics. Figures 4(a) and (b) show schematics and parameters in quadrupedal 
and bipedal locomotion where COM indicates the center of mass of the trunk, L and L 

are the lengths from COM to Joint 1 of the arm and Joint 3 of the leg in the pitch plane, 
respectively, L. and L, are the forward biases from COM to the centers of the nominal foot 
and hand trajectories, respectively, m is the pitch angle of the trunk relative to the 
perpendicular line to the line that involves points AEP and PEP of the foot or the hand 

trajectory, and * and * indicate the values of * for quadrupedal and bipedal locomotion, 
respectively. Therefore, from a kinematical viewpoint, gait transition is achieved by 
changing parameters L A , L L , H A , H L , and y T from values L A , L®, H%, Hf 3 , and ip® to 

values L B A (=0), L B h (-0), H A , Hj 5 , and ^, while the robot walks. Note that from these 
figures, using parameters A A , A L ,and m T , parameters L A and L L are written as 

L A =Z TA sini/; T +A A 
L L =Z TL sin!/; T +A L 

Animals generate various motions and smoothly change them by cooperatively 
manipulating their complicated and redundant musculoskeletal systems. To elucidate these 
mechanisms, many studies have investigated recorded electromyographic (EMG) activities, 
revealing that muscle activity patterns are expressed by the combination of several patterns, 
despite their complexity (dAvella & Bizzi, 2005; Patla et al., 1985; Ivanenko et al., 2004, 2006). 
Furthermore, various motions have common muscle activity patterns, and different motions 
only have a few different specific patterns in combinations that express muscle activity 
patterns. This suggests that only a few patterns provide cooperation in different complex 
movements. 
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-Ik, 




(a) Quadrupedal locomotion (b) Bipedal locomotion 

Fig. 4. Schematics and parameters in quadrupedal and bipedal locomotion. 

Therefore, in this paper first we introduce a couple of parameters and then attempt to achieve 
gait transition by cooperatively changing the physical kinematics from quadrupedal to bipedal 
using the parameters. Specifically, two parameters, f and £ , are introduced, and parameters 

A , A , H, , H, , and m are designed as functions of parameters £ and £ by 

A A (£,, £ 2 ) = A°-{Z TA sin Vt (£,,&) + *«}£, 
A L &,^) = A Q L -{/ TL sin VT (^ 1 ,^ 2 ) + A Q L K 1 
H A (§u§ 1 ) = H Q A +(H B A -H Q A )§ 2 

h l &^ 2 ) = h l q +(Hl-h l q )^ 



(6) 



This aims to use parameters £ and £ to change the distance between the foot and hand 
trajectories and the posture of the trunk, respectively. Using this controller, gait transition is 
achieved by simply changing introduced parameters ( £ , £ ) from (0, 0) to (1, 1), as shown in Fig. 5. 



& 




(1, 1) 
^ Bipedal 



Fig. 5. Trajectories in f 



(0,0)* 



Quadrupedal fi 

-£ plane for gait transition from quadrupedal to bipedal locomotion. 
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4.2 Numerical results 

In this section, we investigate whether the proposed control system establishes gait 
transition from quadrupedal to bipedal locomotion of the robot by numerical simulations. 
The following locomotion parameters were used: S =5 cm, Q =0.5, T =03 s, K T =10.0, 
K. =2.0, K, =2.0, / =6.9 cm, and L =7.6 cm; the remaining parameters for quadrupedal 
and bipedal locomotion are shown in Table 2. These parameters were decided so that the 
robot achieves steady quadrupedal and bipedal locomotion. That is, each locomotion has a 
stable limit cycle in the state space of the robot. Figures 6(a) and (b) show the roll motions of 
the robot relative to the ground during quadrupedal and bipedal locomotion, respectively, 
illustrating the limit cycles in these gait patterns. Note that due to setting 0=0.5 and the 

nominal phase differences of the oscillators as described in Sec. 3.2.2, a trot appears in the 
quadrupedal locomotion in which the robot is usually supported by one arm and the 
contralateral leg. 



Parameter 


Quadrupedal ( * Q ) 


Bipedal (* B ) 


A A [cm] 


-3.0 


1.4 


A L [cm] 


4.0 


-1.6 


H A [cm] 


22.2 


22.2 


H L [cm] 


14.0 


16.5 


ip T [deg] 


72 


12 



Table 2. Parameters for quadrupedal and bipedal locomotion. 

1 — i — — , — — , — — , — l 




-0.04 



0.04 



-0.04 



-0.02 (1 0.02 

Angle [rad] 
(b) Bipedal locomotion 



0.04 



-0.02 0.02 

Angle [rad] 

(a) Quadrupedal locomotion 

Fig. 6. Roll motion relative to the ground. 

To accomplish gait transition, parameters £ and £ are changed to reflect outer commands. 
Specifically, a trajectory in £ — £ plane is designed as the following two successive steps (see Fig. 7): 

Step 1: while the robot walks quadrupedally, parameter £ increases from to £ 
during time interval T s, where Q<£, < 1 • 

Step 2: at the beginning of the swing phase of Arm 1, £ and £ increase from £ to 1 
and from to 1, respectively, during time interval T s. 
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Note that parameter f >0 geometrically indicates that the robot becomes supported only by 
its legs: that is, the appearance of bipedal locomotion. 



Bipedal 




(0,0) 



Quadrupedal (fi, 0) fi 

Fig. 7. Designed trajectory in £ - <f plane to change gait pattern from quadrupedal to 

bipedal locomotion. 

Figures 8(a) and (b) show the roll motion of the robot relative to the ground during gait 
transition, by parameter f =0.7. Specifically, in Fig. 8(a), time intervals T. and T 2 are both set 
at 20 s. Since the nominal step cycle is set at 0.6 s, the nominal kinematical trajectories of 
quadrupedal locomotion change slowly and gradually into bipedal locomotion, and it takes 
many steps to complete the change of gait patterns. Step 1 is from 10 to 30 s, and Step 2 is from 
30.2 to 50.2 s. During Step 1, the foot and the hand positions come closer together, and the roll 
motion of the robot decreases. At the beginning of Step 2, since the robot becomes supported 
only by its legs, roll motion suddenly increases. However, during Step 2 roll motion gradually 
approaches a limit cycle of bipedal locomotion, and after Step 2 the motion converges to it. 
That is, the robot accomplishes gait transition from quadrupedal to bipedal locomotion. On the 
other hand, in Fig. 8(b), time intervals T and T 2 are both set at 5 s. Step 1 is from 10 to 15 s, 
and Step 2 is from 15.1 to 20.1 s. In this case, although the nominal trajectories change 
relatively quickly, roll motion converges to a limit cycle of bipedal locomotion, and the robot 
achieves gait transition. Figure 9 displays the trajectory of the center of mass projected on the 
ground and the contact positions of the hands and the center of the feet on the ground during 
this gait transition. Figure 10 shows snapshots of this gait transition. 
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Fig. 8. Roll motion relative to the ground during gait transition from quadrupedal to bipedal 
locomotion. 
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Fig. 9. Trajectory of center of mass of robot projected on the ground and contact positions of 
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land on the ground. 
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Fig. 10. Snapshots of gait transition from quadrupedal to bipedal locomotion. 
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5. Discussion 

Kinematical and dynamical studies on biped robots are important for robot control. As 
described above, although model-based approaches using inverse kinematics and kinetics 
have generally been used, the difficulty of establishing adaptability to various environments 
as well as complicated computations has often been pointed out. In this paper, we employed 
an internal structure composed of nonlinear oscillators that generated robot kinematics and 
adequately responded based on environmental situations and achieved dynamically stable 
quadrupedal and bipedal locomotion and their transition in a biped robot. Specifically, we 
generated robot kinematical motions using rhythmic signals from internal oscillators. The 
oscillators appropriately responded to sensory signals from touch sensors and modulated 
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the rhythmic signals and physical kinematics, resulting in dynamical stable walking of the 
robot. This means that a robot driven by this control system established dynamically stable 
and adaptive walking through the interaction between the dynamics of the mechanical 
system, the oscillators, and the environment. Furthermore, this control system needed 
neither accurate modeling of the robot and the environment nor complicated computations. 
It just relied on the timing of the touch sensor signals: it is a simple control system. 
Since biped robots generate various motions manipulating many degrees of freedom, the 
key issue in control remains how to design their coordination. In this paper, we expressed 
two types of gait patterns using a set of several kinematical parameters and introduced two 
independent parameters that parameterized the kinematical parameters. Based on the 
introduced parameters, we changed the gait patterns and established gait transition. That is, 
we did not individually design the kinematical motion of all robot joints, but imposed 
kinematical restrictions on joint motions and contracted the degrees of freedom to achieve 
cooperative motions during the transition. Furthermore, we used the same control system 
between quadrupedal and bipedal locomotion except for the physical kinematics, which 
facilitated the design of such cooperative motions and established a smooth gait transition. 
As mentioned above, the analysis of EMG patterns in animal motions clarified that common 
EMG patterns are embedded in the EMG patterns of different motions, despite generating 
such motions using complicated and redundant musculoskeletal systems (d'Avella & Bizzi, 
2005; Patla et al., 1985; Ivanenko et al., 2004, 2006), suggesting an important coordination 
mechanism. In addition, kinematical studies revealed that covariation of the elevation 
angles of thigh, shank, and foot during walking displayed in three-dimensional space is 
approximately expressed on a plane (Lacquaniti et al., 1999), suggesting an important 
kinematical restriction for establishing cooperative motions. In designing a control system, 
adequate restrictions must be designed to achieve cooperative motions. 

Physiological studies have investigated gait transition from quadrupedal to bipedal 
locomotion to elucidate the origin of bipedal locomotion. Mori et al. (1996), Mori (2003), and 
Nakajima et al. (2004) experimented on gait transition using monkeys and investigated the 
physiological differences in the control system. Animals generate highly coordinated and 
skillful motions as a result of the integration of nervous, sensory, and musculoskeletal 
systems. Such motions of animals and robots are both governed by dynamics. Studies on 
robotics are expected to contribute the elucidation of the mechanisms of animals and their 
motion generation and control from a dynamical viewpoint. 

6. Acknowledgment 

This paper is supported in part by Center of Excellence for Research and Education on 
Complex Functional Mechanical Systems (COE program of the Ministry of Education, 
Culture, Sports, Science and Technology, Japan) and by a Grant-in-Aid for Scientific 
Research on Priority Areas "Emergence of Adaptive Motor Function through Interaction 
between Body, Brain and Environment" from the Japanese Ministry of Education, Culture, 
Sports, Science and Technology. 

7. References 

Akimoto, K.; Watanabe, S. & Yano, M. (1999). An insect robot controlled by emergence of 
gait patterns. Artificial Life and Robotics, Vol. 3, 102-105. 



Gait Transition from Quadrupedal to Bipedal Locomotion of an Oscillator-driven Biped Robot 29 



Altendorfer, R.; Moore, N.; Komsuoglu, H.; Buehler, M.; Brown Jr., H.; McMordie, D.; 

Saranli, U.; Full, R. & Koditschek, D. (2001). RHex: A biologically inspired hexapod 

runner. Autonomous Robots, Vol. 11, No. 3, 207-213. 
Aoi, S. & Tsuchiya, K. (2005). Locomotion control of a biped robot using nonlinear 

oscillators. Autonomous Robots, Vol. 19, No. 3, 219-232. 
Aoi, S.; Tsuchiya, K. & Tsujita, K. (2004). Turning control of a biped locomotion robot using 

nonlinear oscillators. Proc. IEEE Int. Conf. on Robotics and Automation, pp. 3043-3048. 
Cham, J.; Karpick, J. & Cutkosky, M. (2004). Stride period adaptation of a biomimetic 

running hexapod. Int. ]. Robotics Res., Vol. 23, No. 2, 141-153. 
d'Avella, A. & Bizzi, E. (2005). Shared and specific muscle synergies in natural motor 

behaviors. PNAS, Vol. 102, No. 8, 3076-3081. 
Fukuoka, Y.; Kimura, H. & Cohen, A. (2003). Adaptive dynamic walking of a quadruped robot on 

irregular terrain based on biological concepts. Int. ]. Robotics Res., Vol. 22, No. 3-4, 187-202. 
Grillner, S. (1981). Control of locomotion in bipeds, tetrapods and fish. Handbook of Physiology, 

American Physiological Society, Bethesda, MD, pp. 1179-1236. 
Grillner, S. (1985). Neurobiological bases of rhythmic motor acts in vertebrates. Science, Vol. 

228, 143-149. 
Hirai, K.; Hirose, M.; Haikawa, Y. & Takenaka, T. (1998). The development of the Honda 

humanoid robot. Proc. IEEE Int. Conf. on Robotics and Automation, pp. 1321-1326. 
Ijspeert, A.; Crespi, A. & Cabelguen, J. (2005). Simulation and robotics studies of salamander 

locomotion. Applying neurobiological principles to the control of locomotion in 

robots. Neuroinformatics, Vol. 3, No. 3, 171-196. 
Inagaki, S.; Yuasa, H. & Arai, T. (2003). CPG model for autonomous decentralized multi- 
legged robot system-generation and transition of oscillation patterns and dynamics 

of oscillators. Robotics and Autonomous Systems, Vol. 44, No. 3-4, 171-179. 
Inoue, K.; Ma, S. & Jin, C. (2004). Neural oscillator network-based controller for meandering 

locomotion of snake-like robots. Proc. IEEE Int. Conf. on Robotics and Automation, pp. 

5064-5069. 
Ivanenko, Y.; Poppele, R. & Lacquaniti, F. (2004). Five basic muscle activation patterns 

account for muscle activity during human locomotion. /. Physiol., Vol. 556, 267-282. 
Ivanenko, Y.; Poppele, R. & Lacquaniti, F. (2006). Motor control programs and walking. 

Neuroscientist, Vol. 12, No. 4, 339-348. 
Kuniyoshi, Y.; Ohmura, Y.; Terada, K.; Nagakubo, A.; Eitokua, S. & Yamamoto, T. (2004). 

Embodied basis of invariant features in execution and perception of whole-body 

dynamic actions— knacks and focuses of Roll-and-Rise motion. Robotics and 

Autonomous Systems, Vol. 48, No. 4, 189-201. 
Kuroki, Y.; Fujita, M.; Ishida, T.; Nagasaka, K. & Yamaguchi, J. (2003). A small biped 

entertainment robot exploring attractive applications. Proc. IEEE Int. Conf. on 

Robotics and Automation, pp. 471-476. 
Lacquaniti, F.; Grasso, R. & Zago, M. (1999). Motor patterns in walking. News Physiol. Sci., 

Vol. 14, 168-174. 
Lewis, M. & Bekey, G. (2002). Gait adaptation in a quadruped robot. Autonomous Robots, Vol. 

12, No. 3, 301-312. 
Lewis, M.; Etienne-Cummings, R.; Hartmann, M.; Xu, Z. & Cohen, A. (2003). An in silico 

central pattern generator: silicon oscillator, coupling, entrainment, and physical 

computation. Biol. Cybern., Vol. 88, 137-151. 
Loffler, K.; Gienger, M. & Pfeiffer, F. (2003). Sensors and control concept of walking 

"Johnnie". Int. }. Robotics Res., Vol. 22, No. 3-4, 229-239. 



30 Humanoid Robots, New Developments 



Mori, S. (1987). Integration of posture and locomotion in acute decerebrate cats and in 

awake, free moving cats. Prog. Neurobioh, Vol. 28, 161-196. 
Mori, S. (2003). Higher nervous control of quadrupedal vs bipedal locomotion in non- 
human primates; Common and specific properties. Proc. 2nd Int. Symp. on Adaptive 

Motion of Animals and Machines. 
Mori, S.; Miyashita, E.; Nakajima, K. & Asanome, M. (1996). Quadrupedal locomotor 

movements in monkeys (M. fuscata) on a treadmill: Kinematic analyses. 

NeuroReport, Vol. 7, 2277-2285. 
Nagasaki, T.; Kajita, S.; Kaneko, K.; Yokoi, K. & Tanie, K. (2004). A running experiment of 

humanoid biped. Proc. IEEE/RS] Int. Conf. on Intelligent Robots and Systems, pp. 136-141. 
Nakajima, K.; Mori, F.; Takasu, C; Mori, M.; Matsuyama, K. & Mori, S. (2004). 

Biomechanical constraints in hindlimb joints during the quadrupedal versus 

bipedal locomotion of M. fuscata. Prog. Brain Res., Vol. 143, 183-190. 
Nakanishi, J.; Morimoto, J.; Endo, G.; Cheng, G.; Schaal, S. & Kawato, M. (2004). Learning 

from demonstration and adaptation of biped locomotion. Robotics and Autonomous 

Systems, Vol. 47, No. 2-3, 79-91. 
Orlovsky, G.; Deliagina, T. & Grillner, S. (1999). Neuronal control of locomotion: from mollusc to 

man. Oxford University Press. 
Patla, A.; Calvert, T. & Stein, R. (1985). Model of a pattern generator for locomotion in 

mammals. Am. ]. Physiol, Vol. 248, 484-494. 
Poulakakis, I.; Smith, J. & Buehler, M. (2005). Modeling and experiments of untethered 

quadrupedal running with a bounding gait: The Scout II robot. Int. J. Robotics Res., 

Vol. 24, No. 4, 239-256. 
Quinn, R.; Nelson, G.; Bachmann, R; Kingsley, D.; Offi, J.; Allen, T. & Ritzmann, R. (2003). 

Parallel complementary strategies for implementing biological principles into 

mobile robots. Int. J. Robotics Res., Vol. 22, No. 3, 169-186. 
Rossignol, S. (1996). Neural control of stereotypic limb movements. Oxford University Press. 
Saranli, U.; Buehler, M. & Koditschek, D. (2001). RHex: A simple and highly mobile hexapod 

robot. Int. ]. Robotics Res., Vol. 20, No. 7, 616-631. 
Taga, G. (1995a). A model of the neuro-musculo-skeletal system for human locomotion I. 

Emergence of basic gait. Biol. Cybern., Vol. 73, 97-111. 
Taga, G. (1995b). A model of the neuro-musculo-skeletal system for human locomotion II. - 

Real-time adaptability under various constraints. Biol. Cybern., Vol. 73, 113-121. 
Taga, G.; Yamaguchi, Y. & Shimizu, H. (1991). Self-organized control of bipedal locomotion 

by neural oscillators in unpredictable environment. Biol. Cybern., Vol. 65, 147-159. 
Takakusaki, K.; Habaguchi, T.; Ohtinata-Sugimoto, J.; Saitoh, K. & Sakamoto, T. (2003). Basal 

ganglia efferents to the brainstem centers controlling postural muscle tone and 

locomotion: A new concept for understanding motor disorders in basal ganglia 

dysfunction. Neuroscience, Vol. 119, 293-308. 
Takuma, T. & Hosoda, K. (2006). Controlling the walking period of a pneumatic muscle 

walker. Int. J. Robotics Res., Vol. 25, No. 9, 861-866. 
Tsujita, K.; Tsuchiya, K. & Onat, A. (2001). Adaptive gait pattern control of a quadruped locomotion 

robot. Proc. IEEE/RS] Int. Conf. on Intelligent Robots and Systems, pp. 2318-2325. 
Vukobratovic, M.; Borovac, B.; Surla, D. & Stokic, D. (1990). Biped locomotion— dynamics, 

stability, control and application. Springer-Verlag. 
Wisse, M.; Schwab, A.; van der Linde, R. & van der Helm, F. (2005). How to keep from 

falling forward: elementary swing leg action for passive dynamic walkers. IEEE 

Trans. Robotics, Vol. 21, No. 3, 393-401. 



Estimation of the Absolute Orientation of a 
Five-link Walking Robot with Passive Feet 

Yannick Aoustin, Gaetan Garcia, Philippe Lemoine 

Institut de Recherche en Communications et Cybernetique de Nantes (IRCCyN), 
Ecole Centrale de Nantes, Universite de Nantes, U.M.R. 6597, 1 rue de la Noe, 

BP 92101, 44321 Nantes Cedex 3, France, 
e-mail: surname.name@irccyn.ec-nantes.fr 



1. Introduction 

One of the main objectives of current research on walking robots is to make their gaits more 
fluid by introducing imbalance phases. For example, for walking robots without actuated 
ankles, which are under actuated in single support, dynamically stable walking gaits have 
been designed with success (Aoustin & Formal'sky 2003; Chevallereau et al. 2003; Zonfrilli 
et al. 2002; Aoustin et al. 2006; Pratt et al. 2001; Westervelt et al. 2003). Both the design of 
reference motions and trajectories and the control of the mechanism along these trajectories 
usually require the knowledge of the whole state of the robot. This state contains internal 
variables which are easy to measure using encoders for example, and also the absolute 
orientation of the robot with respect to the horizontal plane. For robots with unilateral 
constraints with the ground, it may not be possible to derive the latter information from 
internal measurements, as in the case of the absolute orientation of a biped in single 
support. Of course, technological solutions exist such as accelerometers, gyrometers, inertial 
units... But the implementation of these solutions is expensive and difficult. 
In order to overcome these difficulties, we propose to use a state observer which, based 
on the measurements of the joint variables and on a dynamic model of the robot, provides 
estimates of the absolute orientation of the walking robot during imbalance phases. This 
strategy was first validated in simulation for a three link biped without feet, using 
nonlinear high gain observers and a nonlinear observer based on sliding modes with a 
finite time convergence (Lebastard et al. 2006a) and (Lebastard et al. 2006b), for walking 
gaits composed of single support phases and impacts. The main drawback with this 
family of observers is that, when only some of the degrees of freedom are measured, a 
state coordinates transformation is necessary to design their canonical form (Gauthier & 
Bornard 1981; Krener & Respondek 1985; Bornard & Hammouri 1991; Plestan & 
Glumineau 1997). 

In this chapter, the observer is an extended Kalman filter and it is applied to 
SemiQuad , a prototype walking robot built at our institute. SemiQuad is a five link 
mechanism with a platform and two double-link legs. It is designed to study quadruped 
gaits where both front legs (resp. rear legs) have identical movements. Its unique front 
leg (resp. rear leg) acts as the two front legs (resp. rear legs) of the quadruped, so that 
SemiQuad can be considered as an implementation of a virtual quadruped, hence its 
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name. The legs have passive (uncontrolled) feet that extend in the frontal plane. Due to 
this, the robot is stable in the frontal plane. Four motors located in haunches and knees 
drive the mechanism. The locomotion of the prototype is a planar curvet gait. In double 
support, our prototype is statically stable and over actuated. In single support, it is 
unstable and under actuated. 

The reference walking gaits have been designed using an intuitive strategy which is such 
that the absolute orientation is not required. Still, it contains imbalance phases during which 
the observer can be used, and its results evaluated. The estimation results being correct, 
such an observer can be used for contexts where the absolute angle is necessary. 
Furthermore, the idea can be useful for other types of walking robots, such as bipeds with 
imbalance phases. 

The organization of this chapter is the following. Section 2 is devoted to the model of 
SemiQuad. It also contains the data of its physical parameters. The intuitive gaits which were 
designed for SemiQuad are presented in section 3. The statement of the problem of 
estimation of the absolute orientation of SemiQuad is defined in Section 4. Simulation results 
and experimental results avec presented in section 5. Section 6 presents our conclusions and 
perspectives. 

2. Presentation and dynamic models of SemiQuad 

2.1 SemiQuad 

The prototype (see figure 1) is composed of a platform and two identical double-link legs 
with knees. The legs have passive (uncontrolled) feet that extend in the frontal plane. Thus, 
the robot can only execute 2D motions in the sagittal plane, as considered here. There are 
four electrical DC motors with gearbox reducers to actuate the haunch joints between the 
platform and the thighs and the knee joints. Using a dynamic simulation, we have chosen 
parameters of the prototype (the sizes, masses, inertia moments of the links) and convenient 
actuators. The parameters of the four actuators with their gearbox reducers are specified in 
Table 1. The lengths, masses and inertia moments of each link of SemiQuad are specified in 
Table 2. 




Fig. 1. Photography of SemiQuad. 
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DC motor 
+gearbox 


length 
(m) 


mass 
(kg) 


gearbox 
ratio 


Rotor inertia 
(kg.m?) 


Electromagnetical 
torque constant (N.m)/A 


in haunch 
In knee 


0.23 
0.23 


2.82 
2.82 


50 
50 


3.25 10-5 

2.26 10-5 


0.114 
0.086 



Table 1. Parameters of actuators 



physical parameters 


mass 


length 


Center of mass 


Moment of inertia 


of each link 


(kg) 


(m) 


locations (m) 


(kg.m2) 

around the center 

of mass Ci (1=1,... ,5) 


links 1 and 5: shin 


mi = m 5 = 0.40 


li = Is = 0.15 


Si =s 5 = 0.083 


Ii= I 5 =0.0034 


link 3: platform 










+actuators in each haunch 


m 3 = 6.618 


1 3 = 0.375 


s 3 =0.1875 


I 3 = 0.3157 


links 2 and 4: thigh 










+ actuators in each knee 


rri2= rri4= 3.21 


1 2 =1 4 =0.15 


s 2 =s 4 = 0.139 


I 2 =I 4 = 0.0043 



Table 2. Parameters of SemiQuad 

The maximum value of the torque in the output shaft of each motor gearbox is 40 N.m . This 
saturation on the torques is taken into account to design the control law in simulation and in 
experiments. The total mass of the quadruped is approximately 14 kg. The four actuated 
joints of the robot are each equipped with one encoder to measure the angular position. The 
angular velocities are calculated using the angular positions. Currently the absolute 
platform orientation angle a (see figure 2) is not measured. As explained before, the 
proposed walking gait does not require this measurement. Each encoder has 2000 pts/rev 
and is attached directly to the motor shaft. The bandwidth of each joint of the robot is 
determined by the transfer function of the mechanical power train (motors, gearboxes) and 
the motor amplifiers that drive each motor. In the case of SemiQuad, we have approximately 
a 16 Hz bandwidth in the mechanical part of the joints and approximately 1.7 kHz for the 
amplifiers. The control computations are performed with a sample period of 5 ms (200 Hz). 
The software is developped in C language. 
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Fig. 2. SemiQuad' s diagram: generalized coordinates, torques, forces applied to the leg tips, 
locations of mass centers. 
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2.2 Dynamic model of SemiQuad 

Figure 2 gives the notations of the torques, the ground reactions, the joint variables and the 
Cartesian position of the platform. Using the equations of Lagrange, the dynamic model of 
SemiQuad can be written: 

D.q.+H e q.+G.=B.r + D Ri R 1 +D Ri R 2 (1) 

with, q e - q 1 x z . The vector q is composed of the joint actuated variables and the 
absolute orientation of the platform, q = [6j 6 2 6 3 6 4 a] ; (x ,z J are the Cartesian 
coordinates of the platform center. The joint angle variables are positive for counter 
clockwise motion. D e (q) (7x7) is the inertia positive definite matrix. The matrix 

H e (q,q) (7x7) represents the Coriolis and centrifugal forces and G e (q) (7x1) is the vector of 
the gravity forces. B e is a constant matrix composed of ones and zeros. Each matrix 
D R (q) (7x2) ( j =1,2 ) is a jacobian matrix which represents the relation between feet 

Cartesian velocities and angular velocities. It allows to take into account the ground reaction 
R on each foot. If foot j is not in contact with the ground, then R, =[0,0] . The 

term r -[rj T 2 T 3 T 4 ] is the vector of four actuator torques. Let us assume that during 
the single support, the stance leg acts as a pivot joint. Our hypothesis is that the contact of 
the swing leg with the ground results in no rebound and no slipping of the swing leg. Then, 
in single support, equation (1) can be simplified and rewritten as: 

Dq + Hq + G=Br (2) 

1 T 
As the kinetic energy K = — q Dqdoes not depend on the choice of the absolute frame (see 

(Spong, M. & Vidyasagar M. 1991)) and furthermore variable a defines the absolute orientation 
of SemiQuad, the inertia matrix D (5x5) does not depend on a , and then D = T)(Q 1 ,Q 1 ,Q i ,Q A ) . As 
for the case of equation (1), the matrix H(q,q) (5x5) represents the Coriolis and centrifugal 
forces and G(q) (5x1) is the vector of gravity forces. B is a constant matrix composed of ones 
and zeros. Equation (2) can be written under the following state form: 



q 

D _1 (-Hq-G + Br) 



f(x) + g(q rel )r (3) 



With x = Tq T q T l and the joint angle vector q^-pj 8 2 8 3 6 4 ] . The state space is 
defined as xeXc R 10 = \ x = Tq T q T 1 Iql < q M < °°; q e ]-7l,7l]" \ . From these definitions, it 
is clear that all state coordinates are bounded. 

2.3 Discrete Model 

Our objective is to design an extended Kalman filter with a sampling period A to observe the 

absolute orientation a . Then it is necessary to find a discrete model equivalent to (3). A 

x — x 
possible solution is to write x = — — and (3) becomes: 



Estimation of the Absolute Orientation of a Five-link Walking Robot with Passive Feet 35 



x k+1 =x k +A(f(x k ) + g(q relk )r k ) (4) 

For SemiQuad, we have preferred to sample the generalized coordinates and their velocities 
using approximations to a different order by a Taylor development: 

x(t + A) = x(t) + Ax(t) + A 2 ^ + ... + A n ^)+£, (5) 

2! n! 

From (5), developments to second order and first order have been used for the generalized 
coordinates and their velocities, respectively. Limiting the order where possible limits the 
noise, but using second order developments for position variables introduces their second 
order derivative and allows to make use of the dynamic model. 

t:H:M£K(ot) 

With this method, a discrete model of (3) is calculated. If we add the equation 
corresponding to the measurement of the joint angles, we obtain the following system: 

x k + i=4( x k> r k) 

T (7) 

y k = h(x k ) = (e, e 2 e 3 e 4 ) =cx k withc = i 4x4 

2.4 Impulsive impact equation 

The impact occurs at the end of the imbalance phase, when the swing leg tip touches the 
ground, i.e. : 

xe S ={xe X q=q f } where q f is the final configuration of SemiQuad. The instant of impact 
is denoted by T f . Let us use the index 2 for the swing leg, and 1 for the stance leg during the 
imbalance phase in single support. We assume that: 

♦ the impact is passive, absolutely inelastic; 

♦ the swing leg does not slip when it touches the ground; 

♦ the previous stance leg does not take off; 

♦ the configuration of the robot does not change at the instant of impact. 

Given these conditions, at the instant of an impact, the contact can be considered as 
impulsive forces and defined by Dirac delta-functions R j =I R A(t-T i ) (j=l, 2). Here 



Ir = Ir . Ir ' i s the vector of the magnitudes of the impulsive reaction in leg j 
(Formal'sky 1974). The impact model is calculated by integrating (1) between Tf (just before 
impact) and T* (just after impact) . The torques provided by the actuators at the joints, and 

Coriolis and gravity forces, have finite values, thus they do not influence the impact. 
Consequently the impact equation can be written in the following matrix form: 

De(q)(q:-q;) = D Ri (q)I R| +D R2 (q)I Rj (8) 

Here, q is the configuration of SemiQuad at t = T s , q~ and q* are the angular velocities just 
before and just after impact, respectively. Furthermore, the velocity of the stance leg tip 
before impact is null. Then we have: 
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D^q;=0 2xl (9) 

After impact, both legs are stance legs, and the velocity of their tip becomes null: 






qI=o 4xl (io) 



3. Walking gait strategy for SemiQuad 

By analogy with animal gaits with no flight phase, SemiQuad must jump with front or back leg 
to realize a walking gait. There is no other way to avoid leg sliding. Then it is necessary to take 
into account that SemiQuad is under actuated in single support and over actuated in double 
support. Let us briefly present the adopted strategy to realize a walking gait for SemiQuad. It 
was tested first in simulation to study its feasibility and then experimentally (Aoustin et al. 
2006). A video can be found on the web page http://www.irccyn.ec- 
nantes.fr/irccyn/d/en/equipes/Robotique/Themes/Mobile. Figure 4 represents a sequence of stick 
configurations for one step to illustrate the gait. Let us consider half step n as the current half 
step, which is composed of a double support, a single support on the rear leg and an impact on 
the ground. During the double support, SemiQuad has only three degrees of freedom. Then its 
movement can be completely prescribed with the four actuators. A reference movement is 
chosen to transfer the platform centre backwards. This is done by defining a polynomial of 
third order for both Cartesian coordinates x p and z p . The coefficients of these polynomials are 
calculated from a choice of initial and final positions, of the initial velocity and an intermediate 
position of the platform centre. The reference trajectories for the four actuated joint variables 
are calculated with an inverse geometric model. Then, by folding and immediately thrusting 
the front leg, the single support phase on the rear leg starts. During this imbalance phase, 
SemiQuad has five degrees of freedom and its rotation is free around its stance leg tip in the 
sagittal plane. Reference trajectories are specified with third order polynomial functions in 
time for the four actuated inter-link angles. However, the final time T p of these polynomial 
functions is chosen smaller than the calculated duration T ss of the single support phase, such 
that before impact the four desired inter-link angles (8 id i = 1,...,4) become constant. Using this 

strategy, we obtain the desired final configuration of our prototype before the impact even if 
T ss is not equal to the expected value. 

6 id =a + ai t + a 2 t 2 +a 3 t 3 if t<T p 

e ld =e ie (T p ) if t p < t <t ss 

The coefficients of these polynomials are calculated from a choice of initial, intermediate and 
final configurations and of the initial velocity for each joint link. Just after impact, the next 
half step begins and a similar strategy is applied (figure 4). The tracking of the reference 
trajectories is achieved using a PD controller. The gains, which were adjusted using pole 
placement, were tested in simulation and in experiments. Figure 3 shows the evolutions of 
the absolute orientation variable a(t) and its velocity d(t) , obtained from the simulation of 
SemiQuad for five steps. These graphs clearly show that the dynamics of the absolute 
orientation cannot be neglected in the design of a control law based on a state feedback. The 
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durations of the double support phase and the single support phase are 1.5 s and 0.4 s 
respectively. 
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Fig. 3. Evolution of a(t) and d(t) in simulation during the walking gait for five steps. 



4. Observation of the absolute orientation of SemiQuad 

System (3) is generically observable if the following matrix O has a full rank (see (Plestan & 
Glumineau 1997)), which is equal to 10, with ( ki,...,k P ) the p observability indexes. 



0= dh, 



dL? 



dh. 



dL* 



r\) 



(12) 



where dh is the gradient vector of function h (see system (7)) with respect to the state vector 

x, and dLfh is the Lie derivative of h along the vector field f . We have also checked that the 

equivalent property is satisfied by the discrete model. This means that, at each sampling 

time tk, it is possible to find an observability matrix with 10 independent rows or columns. 

Having checked system observability, we propose an extended Kalman filter to observe the 

absolute orientation. The design of this extended Kalman filter is now described. 

In practice, due to uncertainty in the determination of parameters and to angular 

measurement errors, system (3), and of course system (7), are only approximations. A 

standard solution is to represent modelling and measurement errors as additive noises 

disturbing the system. 

Let us consider the associated "noisy" system: 



: 4(x k ,r k ) + a k 



yk=c(><k) : 



6, 8, 6, 



ft 



(13) 



In the case of a linear system, if a k and P k are white Gaussian noises, mutually independent 
and independent of x, the Kalman filter is an optimal estimator. When the system is not linear, 
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it is possible to use the extended Kalman filter (EKF) by linearizing the evolution equation f s 
and the observation equation (which is in our case linear) around the current state estimate. 
Although convergence and optimality are no longer guaranteed, the interest and the 
effectiveness of the extended Kalman filter have been proved in many experimental cases. The 
extended Kalman filter is very often used as a state observer (Bonnifait & Garcia 1998). 



(a) Double support (half step n) : 
The projection of the platform center 
is halfway between the leg tips 



(f) Double support (end of half step n and start of half step n+1): 

Just after landing with an impact of the front leg. After half step 

n, the platform center has moved forward. 




(b) Double support (half step n) : 

The projection of the platform center is 

closer to the back leg tip 



(g) Double support (half step n+1) : 

The projection of the platform center is 

closer to the front leg tip. 





(c) Double support (half step n): 
The front leg is unbent just before 
take off (before the single support) 



(h) Double support (half step n+1): 

The back leg is unbent just before take off 

(before the next single support phase) 




(d) Single support (half step n): 

Just after jump of the front leg, 

the front leg is bent. 




(i) Single support (half step n+1) : 

Just after jump of the back leg, 

the back leg is bent. 




(c) Single support (half step n): 

The distance between the leg tips 

is larger than in the previous 

double support phase. 




(j) Single support (half step n+1): 

The distance between the leg tips is smaller than 

in the previous double support phase. 



Fig. 4. Plot of half steps n and n+1 of SemiQuad. as a sequence of stick figures. 
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In the case of our system, the equations of the extended Kalman filter are: 

• a priori estimation: uses data available before tk+i 

Xk + l= f s (Xk> r k) 

• a posteriori estimation: uses data available at tk+i 

x k+1 =x k+1 +K k+1 (y ktl -Cx k+1 ) 



(14) 



with: 



P =(I -K C)P~ 

*-k + l V 10x10 1 ^k + l^-y ± k + 



A- =lg-| and^.^p-.C-'iCP-.C 1 



(15) 



Here y k +i are the measured joint variables, which are the first four components of vector x k 
at time tk, and Cx k+1 is the a priori estimation of these joint variables. Q a and Q„ are the 
covariance matrices of a k and P k , K is the Kalman gain and P the estimated covariance 
matrix of prediction ( P" at tk) and estimation ( P at tk) errors. 

5. Simulation and experimental results. 

For the simulation and experimental tests, the physical parameters defined in section 2 are 
used. The sampling period A is equal to 5 ms. The incremental encoders having N=2000 
points per revolution, the variance of measurement is taken equal to n 2 /(3AN 2 ) =8.225 1CH. 
The errors of incremental encoders being independent, we have chosen Q„ =8.225 1CH I 4x4 . 
The components of Q a for the configuration variables are determined by trial and error 
from simulation and experimental results. The components of Q a for velocity variables are 

derived from the values for position variables, taking into account the sampling period, and 
are larger than those corresponding to position variables. 

3.0 10- 10 I 5x5 5x5 

5x5 3.0 10- 5 I 5x5 

The initialization of the covariance matrix P takes into account a determination of a less precise 
and fixes the variances on the velocities, as for Q a , taking into account of the sampling period. 



Po 



^8 10- 10 I 4x4 4xl 4x5 

lx4 1.7 10- 5 lx5 

5x4 5xl 510- 2 I 5x5y 



All the tests in simulations and in experiments were done following the scheme of figure 5. In 
simulations, the joint variables 6 ; and their velocities 8 t (i=l,2,3,4) obtained by integration of 
the direct dynamic model of SemiQuad and the control torques I", (i=l,2,3,4) are the inputs of 
the extended Kalman filter. For the experimental tests, the joint variables 6[ (i= 1,2,3,4) are 
measured and differentiated to obtain the velocities 8j . These eight variables, together with 



40 



Humanoid Robots, New Developments 



the four experimental torques T i (i=l,2,3,4) are the inputs of the extended Kalman filter. In 
both cases, the state vector ["9999 



8, 6 2 8, 6 4 d] T is estimated. 



r, (i = l,2,3,4) 




9, 9/ (i = 1,2,3,4) 

> 



T, (1 = 1,2,3,4) 


Estended 

Kalman 

Filter 


a, a, 9, e, ( 






1 = 1,2,3,4) 


6, 6, (i = 1,2,3,4) 











Fig. 5. Principle of tests of the extended Kalman filter with SemiQuad. 

5.1 Simulation results 

Figure 6 shows the evolution of the estimated and real orientations of the platform during a 
single support phase of the cyclic walking gait of SemiQuad. The initial error, which has been 
set to 0.0262 rad (1.5 degree), is rapidly reduced, andthe estimated orientation converges 
towards the real orientation. Let us notice that a maximum value of 1.5 degree for the initial 
error is reasonable because experimentally d(0) is determined by the geometric model at 
the end of the previous double support. 

In the model used by the observer, we do not consider any friction. We have 
performed robustness tests of our estimator by adding a viscous friction, 1^ =F v 9 i 

(i=l, 2,3,4), and a Coulomb friction r\ = F B 6, (i=l, 2,3,4) to the simulation model. Figure 
7 shows the estimated and real orientations of the platform of SemiQuad in the case 
when a viscous friction is added. The coefficient F v equals to 0.1 N.m.s/rad. Similarly, 
figure 8 shows the estimated and the real orientations of the platform of SemiQuad in 
the case of a Coulomb friction, with a coefficient F s equal to 0.2 N.m. Last robustness 
test (figure 9) presents the estimated and real orientations of the platform of 
SemiQuad with an inertia reduced by 5% for the platform in the simulator. In practice, 
5% precision on inertia is feasible (see identification results in (Lydoire & Poignet 
2003)). 

From these robustness tests, we can conclude that we have no finite time convergence. 
However, the final relative errors of the estimated orientations of the platform of SemiQuad 
are small. Since it will be possible to update the initial condition of the estimator during the 
next double support phase, with the measurements of the encoders and the geometrical 
relations, such errors are not a problem. 



5.2 Experimental results 

At each step, the estimator is initialized with the configurations and the velocities at the 
end of the previous double support phase. At each sampling time, velocities are obtained 
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by the usual differentiation operation B { -— — — - (i=l,2,3,4). No filtering is 

applied to smooth the measured joint variables Q t , their velocities 8 t and the four torques 
I\ (i=l,2,3,4). 
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Fig. 6. Absolute orientation a(t) of the platform: real (solid) and estimated (dashed). 
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Fig. 7. Absolute orientation a(t) of the platform: real (solid) and estimated (dashed), with a 
supplement viscous friction. 
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Fig. 8. Absolute orientation a(t) of the platform: real (solid) and estimated (dashed), with a 
supplement Coulomb friction. 
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Fig. 9. Absolute orientation a(t) of the platform: real (solid) and estimated (dashed), with an 
error on the inertia of the platform. 



Estimation of the Absolute Orientation of a Five-link Walking Robot with Passive Feet 



43 



Figure 10 shows the estimation of the absolute orientation of the platform a(t) . The 
duration of the single support phase is 15% smaller than in simulation. It can probably be 
explained by the fact that our knowledge of the physical parameters of the robot is not 
perfect, and that we neglected effects such as friction in the joints. 

Currently, there is no experimental measured data about the evolution of a(t) in single 
support, because SemiQuad is not equipped with a sensor such as a gyrometer or a 
gyroscope. However, in double support, using the geometric and kinematic models it is 
possible to calculate a(t) and d(t) . In addition to providing initial conditions for the 

observer, this also allows to calculate the orientation at the end of the single support phase, 
just after impact. The corresponding value is displayed as a star on the next graph, and is 
equal to 0.01 rad (0.57 degree). The difference between this value and the estimated value at 
the same instant is almost 3 degrees. 
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Fig. 10. Estimation of the absolute orientation a(t) of the platform using the experimental data. 



6. Conclusion 

An application of the digital extended Kalman filter has been presented for the problem of 
estimating the absolute orientation of SemiQuad, a 2D dynamically stable walking robot. There 
are some differences between simulations and experiments, but the results still prove the 
ability of our observer to yield a short term estimation of the orientation of the robot. The 
precision is sufficient for the estimation to be useful for calculating the dynamic model and 
monitoring the balance of the robot. This task is important for SemiQuad, and vital for bipeds, 
to which the idea is also applicable (see Lebastard, Aoustin, & Plestan 2006 for a different type 
of observer). Given the possibility to re-initialize the observer at each double support phase, 
even if very short, as it can be for bipeds, the observer can provide the absolute orientation 
over time without the use of any additional sensor, which was the goal of our work. 
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In a near future, we plan to equip SemiQuad with a gyrometer to fully evaluate the 
performance of our estimator over time. Our perspective is to use the estimated orientation 
in advanced feedback controllers. 
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1. Introduction 



Children love toys. Human caregivers often employ learning aids, such as books, 
educational videos, drawing boards, musical or textured toys, to teach a child. These social 
interactions provide a rich plethora of information to a child, and hence they should be 
extrapolated to a humanoid robot as well (Arsenio, 2004d). 

Inspired in infant development, we aim at developing a humanoid robot's perceptual 
system through the use of learning aids, cognitive artifacts, and educational activities, so 
that a robot learns about the world according to a child's developmental phases (Arsenio, 
2004c). Of course, the human caregiver plays a very important role on a robot's learning 
process (as it is so with children), performing educational and play activities with the robot 
(such as drawing, painting or playing with a toy train on a railway), facilitating robot's 
perception and learning. The goal is for the humanoid robot Cog (Figure 1) to see the world 
through the caregiver's eyes. 




Fig. 1. The Humanoid Robot Cog. 



This chapter will begin by addressing learning scenarios on section 2, which are employed 
to enable the robot to acquire input data in real-time to train learning algorithms. Tools 
developed to acquire such data, such as object / scene /texture segmentation (Arsenio, 



1 Research work was developed while the author was at MIT. The author is currently working at 
Siemens. 
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2004e), sound / face segmentation (Arsenio, 2004f), object / face / hand tracking (Arsenio, 
2004e;f) or even robot actuation for active perception (Arsenio, 2003; Metta & Fitzpatrick, 
2003) are described in the literature. 

We do not treat children as machines, i.e., automatons. But this automaton view is still 
widely employed in industry to build robots. Building robots involves indeed the hardware 
setup of sensors, actuators, metal parts, cables, processing boards, as well as software 
development. Such engineering might be viewed as the robot genotype. But equally 
important in a child is the developmental acquisition of information in a social and cultural 
context (Vigotsky, 1962). 

Therefore, for a humanoid robot to interact effectively in its surrounding world, it must be 
able to learn. Section 3 presents learning strategies applied to a diverse set of problems, so 
that the robot learns information about objects, scenes, people and actions. Training data for 
the algorithms is generated on-line, in real-time, while the robot is in operation. We will 
describe the development of these learning mechanisms, presenting statistics from a large 
plethora of experimental results. 

We aim at introducing robots into our society and treating them as us, using child 
development as a metaphor for developmental learning of a humanoid robot 

2. Children-like Social Interactions as Learning Scenarios for a Humanoid 

An autonomous robot needs to be able to acquire and incrementally assimilate new 
information, to be capable of developing and adapting to its environment. The field of 
machine learning offers many powerful algorithms, but these often require off-line, 
manually inserted training data to operate. Infant development research suggests ways to 
acquire such training data from simple contexts, and use these experiences to bootstrap to 
more complex contexts. We need to identify situations that enable the robot to 
temporarily reach beyond its current perceptual abilities (Figure 2 shows a set of such 
situations), giving the opportunity for development to occur (Arsenio, 2004c;d; Metta & 
Fitzpatrick, 2003). 

This led us to create children-like learning scenarios for teaching a humanoid robot. These 
learning experiments are used for transmitting information to the humanoid robot Cog to 
learn about objects' multiple visual and auditory representations from books, other learning 
aids, musical instruments and educational activities such as drawing and painting. 
Our strategy relies heavily in human-robot interactions. For instance, it is essential to have a 
human in the loop to introduce objects from a book to the robot (as a human caregiver does 
to a child). A more rich, complete human-robot communication interface results from 
adding other aiding tools to the robot's portfolio (which facilitate as well the children' 
learning process). 

This is achieved by selectively attending to the human actuator (hand or finger). 
Indeed, primates have specific brain areas to process the hand visual appearance 
(Perrett et al., 1990). Inspired by human development studies, emphasis will be placed 
on facilitating perception through the action of a human instructor, and on developing 
machine-learning strategies that receive input from these human-robot interactions 
(Arsenio, 2004a,d). 

Multi-modal object properties are learned using these children educational tools, and 
inserted into several recognition schemes, which are then applied to developmentally 
acquire new object representations (Arsenio, 2004c). 
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Fig. 2. Cog learning from several social interactions. 



2.1 Robot Skill Augmentation through Cognitive Artifacts 

A human caregiver can introduce a robot to a rich world of visual information concerning 
objects' visual appearance and shape. But cognitive artifacts, which enhance perception, can 
also be applied to improve perception over other perceptual modalities, such as auditory 
processing. 

We exploit repetition (rhythmic motion, repeated sounds) to achieve segmentation and 
recognition across multiple senses (Arsenio, 2004d;f). We are interested in detecting 
conditions that repeat with some roughly constant rate, where that rate is consistent with 
what a human can easily produce and perceive. This is not a very well defined range, but 
we will consider anything above 10Hz to be too fast, and anything below 0.1Hz to be too 
slow. Repetitive signals in this range are considered to be events in our system: waving a 
flag is an event, but the vibration of a violin string is not an event (too fast), and neither is 
the daily rise and fall of the sun (too slow). Such a restriction is related to the idea of natural 
kinds, where perception is based on the physical dimensions and practical interests of the 
observer. 

Abrupt motions, such as a poking movement, which involve large variations of movement, 
are also used to extract percepts (Arsenio, 2003; Arsenio 2004c;d). 

Teaching from Books 

Learning aids are often used by human caregivers to introduce the child to a diverse set of 
(in)animate objects, exposing the latter to an outside world of colors, forms, shapes and 
contrasts, that otherwise might not be available to a child (such as images of whales and 
cows). Since these learning aids help to expand the child's knowledge of the world, they are 
a potentially useful tool for introducing new informative percepts to a robot. 
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Fig. 3. Object templates extracted from books. 

Children's learning is hence aided by the use of audiovisuals, and especially books, 
during social interactions with their mother or caregiver. Indeed, humans often paint, 
draw or just read books to children during their childhood. Books are also a useful tool to 
teach robots different object representations and to communicate properties of unknown 
objects to them. 

Figure 3 shows images of object templates extracted from books using an active object 
segmentation algorithm - active in the sense that a human points to the object in a book 
with his/her finger (Arsenio, 2004e). This human aided perceptual grouping algorithm 
extracts informative percepts from picture books (fabric, foam or cardboard books), by 
correlating such information with a periodically moving human actuator (finger), resulting 
on signal samples for objects' image templates. Whenever the interacting human makes 
repetitive sounds simultaneously, object sound signatures, as well as cross-modal 
signatures, are segmented as well (Fitzpatrick & Arsenio, 2004). This data is employed 
afterwards as inputs for learning (section 3). 
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Matching Geometric Patterns: Drawings, Paintings, Pictures ... 

Object descriptions may come in different formats - drawings, paintings, photos, et 
cetera. Hence, the link between an object representation in a book and real objects 
recognized from the surrounding world can be established through object recognition. 
Objects will be recognized using geometric hashing (section 3), a widely used 
recognition technique. The algorithm operates on three different set of features: 
chrominance and luminance topological regions, and shape (determined by an object's 
edges), as shown by Figure 4. Except for a description contained in a book, which was 
previously segmented, the robot had no other knowledge concerning the visual 
appearance or shape of such object. 

Additional possibilities include linking different object descriptions in a book, such as a 
drawing, as demonstrated by two samples of the experimental results presented in Figure 4. 
A sketch of an object contains salient features concerning its shape, and therefore there are 
advantages in learning, and linking, these different representations. This framework is also 
a useful tool for linking other object descriptions in a book, such as a photo, a painting, or a 
printing (Arsenio, 2004a;d). 




Fig. 4. Matching objects from books to real world objects and drawings. 



Explorations into the World of Tools and Toys 

A plethora of other educational tools and toys are widely used by educators to teach 
children, helping them to develop. Examples of such tools are toys (such as drawing 
boards), educational TV programs or educational videos. The Baby Einstein collection 
includes videos to introduce infants and toddlers to colors, music, literature and art. Famous 
painters and their artistic creations are displayed to children on the Baby Van Gogh video, 
from the mentioned collection. This inspired the design of learning experiments in which 
Cog is introduced to art using an artificial display (the computer monitor), as shown in 
Figure 5 (Arsenio, 2004d). 
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Computer monitor 




Segmentation Tree Sky Field Ground 

Fig. 5. The image of a painting by Vincent Van Gogh, Road with Cypress and Star, 1890 is 
displayed on a computer screen. Paintings are contextually different than pictures or 
photos, since the painter style changes the elements on the figure considerably. Van Gogh, a 
post-impressionist, painted with an aggressive use of brush strokes. But individual painting 
elements can still be grouped together by having a human actor tapping on their 
representation in the computer screen to group them together. 

Learning First Words 

Auditory processing is also integrated with visual processing to extract the name and 
properties of objects. However, hand visual trajectory properties and sound properties 
might be independent - while tapping on books, it is not the interacting human caregiver 
hand that generates sound, but the caregiver vocal system pronouncing sounds such as the 
object's name. Therefore, cross-modal events are associated together under a weak 
requirement: visual segmentations from periodic signals and sound segmentations are 
bound together if occurring temporally close (Fitzpatrick & Arsenio, 2004). This strategy is 
also well suited for sound patterns correlated with the hand visual trajectory (such as 
playing musical tones by shaking a rattle). 



2.2 Educational, Learning Activities 

A common pattern of early human-child interactive communication is through activities 
that stimulate the child's brain, such as drawing or painting. Children are able to extract 
information from such activities while they are being performed on-line. This capability 
motivated the implementation of three parallel processes which receive input data from 
three different sources: from an attentional tracker, which tracks the robot's attentional 
focus, and it is attracted to a new salient stimulus; from a multi-target tracking algorithm 
implemented to track simultaneously multiple targets; and from an algorithm that 
selectively attends to the human actuator (Arsenio, 2004d). 

Learning Hand Gestures 

Standard hand gesture recognition algorithms require an annotated database of hand 
gestures, built off-line. Common approaches, such as Space-Time Gestures (Darrel & 
Pentland, 1993), rely on dynamic programming. Others (Cutler & Turk, 1998) developed 
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systems for children to interact with lifelike characters and play virtual instruments by 
classifying optical flow measurements. Other classification techniques include state 
machines, dynamic time warping or Hidden Markov Models (HMMs). 

We follow a fundamentally different approach, being periodic hand trajectories mapped 
into geometric descriptions of objects, to classify simple circular or triangular movements, 
for instance (Arsenio, 2004a). Figure 6b reports an experiment in which a human draws 
repetitively a geometric shape on a sheet of paper with a pen. The robot learns what was 
drawn by matching one period of the hand gesture to the previously learned shape (the 
hand gesture is recognized as circular in the Figure). Hence, the geometry of periodic hand 
trajectories is recognized in real-time to the geometry of objects in an object database, 
instead of being mapped to a database of annotated gestures. 




Fig. 6. Sample of experiments for object and shape recognition from hand gestures. 

Object Recognition from Hand Gestures. 

The problem of recognizing objects in a scene can be framed as the dual version of the hand 
gestures recognition problem. Instead of using previously learned object geometries to 
recognize hand gestures, hand gestures' trajectories are now applied to recover the 
geometric shape (set of lines computed by applying the Hough Transform) and appearance 
(given by an image template enclosing such lines) of a scene object (as seen by the robot). 
Visual geometries in a scene (such as circles) are recognized as such from hand gestures 
having the same geometry (as is the case of circular gestures). Figure 6a shows results for 
such task on an experiment in which an interacting human paints a circle. The robot learns 
what was painted (a circle) by matching the hand gesture to the shape defined by the ink on 
the paper. The algorithm is useful to identify shapes from drawing, painting or other 
educational activities (Arsenio, 2004d). 

Shape from Human Cues 

A very similar framework is applied to extract object boundaries from human cues. Indeed, 
human manipulation provides the robot with extra perceptual information concerning 
objects, by actively describing (using human arm/ hand/finger trajectories) object contours 
or the hollow parts of objects, such as a cup (see experiment with green cup in Figure 6c). 
Tactile perception of objects from the robot grasping activities has been actively pursued - 
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see for instance (Polana & Nelson, 1994; Rao et al., 1989). Although more precise, these 
techniques require hybrid position/ force control of the robot's manipulator end-effector so 
as not to damage or break objects. 

Functional Constraints 

Not only hand gestures can be used to detect interesting geometric shapes in the world as 
seen by the robot. For instance, certain toys, such as trains, move periodically on rail tracks, 
with a functional constraint fixed both in time and space. Therefore, one might obtain 
information concerning the rail tracks by observing the train's visual trajectory. To 
accomplish such goal, objects are visually tracked by an attentional tracker which is 
modulated by an attentional system (Arsenio, 2004d). The algorithm starts by masking the 
input world image to regions inside the moving object's visual trajectory (or outside but on a 
boundary neighborhood). Lines modeling the object's trajectory are then mapped into lines 
fitting the scene edges. The output is the geometry of the stationary object which is 
imposing the functional constraint on the moving object. Figure 6d shows as well an 
experiment for the specific case of extracting templates for rail tracks from the train's motion 
(which is constrained by the railway circular geometry). 



2.3 Learning about People 

Faces in cluttered scenes are located by a computationally efficient algorithm (Viola & Jones, 
2001), which is applied to each video frame (acquired by a foveal camera). If a face is detected, 
the algorithm estimates a window containing that face, as shown in Figure 7. The novelty here 
consists on acquiring a large amount of samples of training data in real-time using a multi- 
object tracking algorithm (Arsenio, 2004d), which allows to group several image templates 
together - from different views of the same tracked face - into the same group. 



(X£lJd«|iT lifH.V • 




9 



AtlffliiLinal 



I Dhjcct fldcrtmi face p*i«Uim'j 

( i rf 



m® 



:7)cfc» 



nbjis.i 
j-.. .... 




Fig. 7. Approach for segmenting and recognizing faces & objects. Training data for 
object/face recognition is extracted by keeping objects and others faces in memory for a 
while, generating a collection of training samples consisting of multiple segmentations of 
objects and faces, (left) on-line experiment on Cog (right) schematic organization. 1. Object 
segmentation 2. Face detection and segmentation 3. Multiple object tracking 4. Object 
Recognition 5. Face Recognition. 
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3. Machine Learning Algorithms 

This section presents a collection of machine learning algorithms and methodologies 
implemented to emulate different cognitive capabilities on the humanoid robot Cog. This 
framework is effectively applied to a collection of AI, computer vision, and signal 
processing problems. It is shown to solve a broad spectrum of machine learning problems 
along a large categorical scope: actions, objects, scenes and people, using real-time data 
acquired from human-robot interactions as described by last section's learning scenarios. 
Most of the learning algorithms introduced receive as training inputs information produced 
by other modules, such as object segmentation (Arsenio, 2004e). But all this training data is 
automatically annotated on-line, in real-time, instead of the standard off-line, manual 
annotation. This is accomplished by having human caregivers introducing new percepts to 
the robot (Arsenio, 2004a). Hence, we are motivated by cognitive development of human 
infants, which is bootstrapped by the helping hand that a human caregiver (and especially 
the infant's mother) provides to the infant (Arsenio, 2004c). 

This chapter does not intend to propose new learning algorithms. Instead, the focus is 
placed on using existing learning algorithms to solve a wide range of problems. 
One essential capability for a humanoid robot to achieve convincing levels of competency is 
object recognition. Therefore, a learning algorithm operating on object color histograms is 
first presented. A more robust algorithm employing geometric hashing techniques is also 
described. These algorithms are not however appropriate to tackle the face recognition 
problem, which is solved using the eigenfaces method. A similar eigenobjects based 
algorithm will be applied as well for sound recognition, using eigensounds. 
Recognition of scenes is especially important for robot localization. An approach based on a 
contextual description of the scene envelope is also described. Contextual descriptions of a 
scene are modeled by a Mixture of Gaussians, being the parameters of such mixture 
estimated iteratively using the Expectation-Maximization algorithm. 

The processing of cross-modal information, among different sensorial capabilities, leads to 
an innovative cross-modal object recognition scheme using a Dynamic Programming 
approach. Contextual features provide another source of information very useful to 
recognize objects - we apply a method similar to a mixture of experts: weighted cluster 
modeling. Another technique employed is Back-propagation Neural Networks for activities' 
identification (and for identifying the function of an object within an activity). This learning 
method is also shown to increase sound recognition rates compared to the eigensounds 
method. Both qualitative and quantitative experimental results are evaluated and discussed 
for each algorithm. Section 3 ends by referring briefly other learning strategies employed by 
the humanoid robot Cog, applied not only for perception but also for robot control by 
learning the underlying dynamic models. 

3.1 Color Histograms 

The object recognition algorithm needs to cluster object templates by classes according to 
their identity. Such task was implemented through color histograms - objects are classified 
based on the relative distribution of their color pixels. Since object masks are available from 
segmentation (Arsenio, 2004e), external global features do not affect recognition, and hence 
color histograms are appropriate. A multi-target tracking algorithm (Arsenio, 2004d) keeps 
track of object locations as the visual percepts change due to movement of the robot's active 
head. Ideally, a human actor should expose the robot to several views of the object being 
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tracked (if the object appearance is view-dependent), in order to link them to the same 
object. This way, a collection of object views becomes available as input. 

Recognition works as follows. Quantization of each of the three color channels originates 8 3 
groups G; of similar colors. The number of image pixels nGi indexed to a group is stored as a 
percentage of the total number of pixels. The first 20 color histograms of an object category 
are saved into memory and updated thereafter. New object templates are classified 
according to their similarity with other object templates previously recognized for all object 
categories, by computing: 



P- ^minimum (n Gi ,n G , ; ) 



If p < th (th set to 0,7) for all of the 20 histograms in an object category, then the object does 
not belong to that category. If this happens for all categories, then it is a new object. If p > th, 
then a match occurs, and the object is assigned to the category with maximum p. 
Whenever an object is recognized into a given category, the average color histogram which 
originated a better match is updated. Given an average histogram which is the result of 
averaging m color histograms, the updating consists of computing the weighted average 
between this histogram (weight m) and the new color histograms (unit weight). This has the 
advantage that color histograms evolve as more samples are obtained to represent different 
views of an object. 
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Fig. 8. (top) Recognition errors. Matches evaluated from a total of 11 scenes (objects are 
segmented and recognized more than once per scene), (bottom) Sequence from an on-line 
experiment of several minutes on the humanoid robot Cog: (1) The robot detects and 
segments a new object - a sofa; (2) New object is correctly assigned to a new category; (3) 
Object, not being tracked, is recognized from previous templates (as shown by the two sofa 
templates mapped to it); (4-5-6) Same sequence for a different, smaller sofa. 

Experimental Results for Template Matching 

Figure 8 presents quantitative performance statistics. It shows a sample of the system 
running on the humanoid robot Cog, while recognizing previously learned objects. Incorrect 
matches occurred due to color similarity among different objects (such as a big and a small 
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sofa). Errors arising from labeling an object in the database as a new object are chiefly due to 
drastic variations in light sources. Qualitative results from an on-line experiment of several 
minutes for object segmentation, tracking and recognition of new objects on the humanoid 
robot are also shown. 

Out of around 100 samples from on-line experiments, recognition accuracy average was 
of 95%. Several experiments have shown however the algorithm not capable to 
differentiate among people's faces, although it differentiated correctly between faces 
and other objects. 

3.2 Geometric Hashing 

This object recognition algorithm consists of three independent algorithms. Each recognizer 
operates along orthogonal directions to the others over the input space (Arsenio, 2004b). 
This approach offers the possibility of priming specific information such as searching for a 
specific object feature (color, shape or luminance) independently of the others. The set of 
input features are: 

■ Color: groups of connected regions with similar color 

■ Luminance: groups of connected regions with similar luminance 

■ Shape. A Hough transform is applied to a contour image (from a Canny edge 
detector). Line orientation is determined using Sobel masks. Pairs of oriented lines 
are then used as input features 

Geometric hashing (Wolfson & Rigoutsos, 1997) is a rather useful technique for high-speed 
performance. In this method, quasi-invariants are computed from training data in model 
images, and then stored in hash tables. Recognition consists of accessing and counting the 
contents of hash buckets. Recognition of objects has to occur over a variety of scene contexts. 
An adaptive Hash table (a hash table with variable-size buckets) algorithm was 
implemented to store affine color, luminance and shape invariants (which are view 
independent for small perspective deformations). Figure 4 displays two results from 
applying this algorithm using shape features (Arsenio, 2004d). 

3.3 Eigenobjects - Principal Component Analysis 

Component Analysis (PCA) is an efficient method to describe a collection of images. The 
corresponding eigenvectors are denominated eigenobjects. 

Let the training set of M images from an object n be {01, 02, . . . 0m} (see Figure 9). The 
average image of this set is defined by 

M 
i|r = 1/M y^(p[ ■ The covariance matrix for the set of training objects is thus given by (1): 

i=l 

M 

c0 n =Xr,r,^AA T (1) 

being F n = n _ i|f the difference of each image from the mean, and A = [Fi, F2, . . . , Tm] • 
Cropped faces are first rescaled to 128 x 128 images (size S = 128 2 ). Determining the 
eigenvectors and eigenvalues of the S 2 size covariance matrix C is untractable. However, C 
rank does not exceed M -1. For M < S 2 there are only M -1 eigenvectors associated to non- 
zero eigenvalues, rather than S 2 . Let Vj be the eigenvectors of the M x M matrix A T A. The 
eigenfaces |ij are given by: 
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Fig. 9. a) Face image samples are shown for each of three people out of a database of six; b) 
Average face image for three people on the database, together with three eigenfaces for each 
one; c) Confusion table with face recognition results. 

The number of basis functions is further reduced from M to M' by selecting only the most 
meaningful M' eigenvectors (with the largest associated eigenvalues), and ignoring all the 
others. Classification of the image of object cp consists of projecting it into the eigenobject 
components, by correlating the eigenvectors with it, for obtaining the coefficients 
Wi=(ii((|)-i]/) / i= 1,..., M'of this projection. The weights Wiform a vector D ={wi, wi . . . , wm'). 
An object is then classified by selecting the minimum hi distance to each object's coefficients 

in the database e^= Q - Q . where Dk describes the k th object class in the database. If e^ is 
below a threshold, then it corresponds to a new object. 

Eigenfaces: Experimental Results 

The eigenvectors are now denominated eigenfaces (Turk & Pentland, 1991), because they are 
face-like in appearance (see Figure 9 - the confusion table in this Figure presents results for 
recognizing three different people, being the average recognition accuracy of 88.9%). The 
training data set contains a lot of variation. Validation data corresponds to a random 20% of 
all the data. 

Eigensounds for Sound Recognition 

A collection of annotated acoustic signatures for each object are used as input data (see 
Figure 10) for a sound recognition algorithm by applying the eigenobjects method. A sound 
image is represented as a linear combination of base sound signatures (or eigensounds). 
Classification consists of projecting novel sounds to this space, determining the coefficients 
of this projection, computing the L2 distance to each object's coefficients in the database, and 
selecting the class corresponding to the minimum distance. 

Cross-modal information aids the acquisition and learning of unimodal percepts and 
consequent categorization in a child's early infancy. Similarly, visual data is employed here 
to guide the annotation of auditory data to implement a sound recognition algorithm. 
Training samples for the sound recognition algorithm are classified into different categories 
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by the visual object recognition system or from information from the visual object tracking 
system. This enables the system, after training, to classify the sounds of objects not visible. 
The system was evaluated quantitatively by random selection of 10% of the segmented data 
for validation, and the remaining data for training. This process was randomly repeated 
three times. It is worth noticing that even samples received within a short time of each other 
often do not look too similar, due to background acoustic noise, noise on the segmentation 
process, other objects' sounds during experiments, and variability on how objects are 
moved and presented to the robot. For example, the car object is heard both alone and with 
a rattle (either visible or hidden). The recognition rate for the three runs averaged to 82% 
(86.7%, 80% and 80%). Recognition rates by object category were: 67% for the car, 91.7% for 
the cube rattle, 77.8% for the snake rattle and 83.3% for the hammer. Most errors arise from 
mismatches between (car and hammer) sounds. 
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1)7 random sound samples for each of 4 objects. From top to 
bottom: hammer, cube rattle, car and snake rattle, respectively 



2) Average 3) Eigenobjects corresponding 
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Fig. 10. Sound recognition. Acoustic signatures for four objects are shown along the rows. 
(1) Seven sound segmentation samples are shown for each object, from a total of 28 (car), 49 
(cube rattle), 23 (snake rattle) and 34 (hammer) samples. (2) Average acoustic signatures. 
The vertical axis corresponds to the frequency bands and the horizontal axis to time 
normalized by the period. (3) Eigensounds corresponding to the three highest eigenvalues. 
The repetitive nature of the sound generated by an object under periodic motion can be 
analyzed to extract an acoustic signature for that object. We search for repetition in a set of 
frequency bands, collecting those whose energies oscillate together with a similar period 
(Fitzpatrick & Arsenio, 2004). 



3.4 Mixture of Gaussians for Scene Recognition 

Wavelets (Strang & Nguyen, 1996) are employed to extract contextual features. Processing is 
applied iteratively through the low frequency branch of the wavelet transform over T=5 
scales, while higher frequencies along the vertical, horizontal and diagonal orientations are 
stored (due to signal polarity, this corresponds to a compact representation of six 
orientations in three images). The input is thus represented by v(x, y) = v(p) ={vk(x, y), 
k=l,..., N}, with N=3T=15. Each wavelet component at the i' h level has dimensions 256/2' x 
256/ 2', and is down-sampled to an 8 * 8 image: 



7 (^> , )=X v ( i 'J) h ( i - x 'J-y) 



(3) 
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where h(x,y) is a Gaussian window. Thus, v(x,y) has dimension 960. Similarly to other 
approaches (Torralba, 2003), the dimensionality problem is reduced to become tractable by 
applying Principal Component Analysis (PC A). The image features v{p) are decomposed 
into the basis functions given by the PCA: 

D 

i=l p,k 

where the functions <p' (p) are the eigenfunctions of the covariance operator given by 
v, (p) ■ These functions incorporate both spatial and spectral information. The 
decomposition coefficients are obtained by projecting the image features v, (p) into the 
principal components Cj, used hereafter as input context features. 

The vector c ~{cy i= 1, . . . , D} denotes the resulting D-dimensional input vector, with D=E nv 
2< D< Th , where m denotes a class, Th an upper threshold and E m denotes the number of 
eigenvalues within 5% of the maximum eigenvalue. These features can be viewed as a 
scene's holistic (Oliva & Torralba, 2001) representation since all the regions of the image 
contribute to all the coefficients, as objects are not encoded individually. The effect of 
neglecting local features is reduced by mapping the foveal camera (which grabs data for the 
object recognition scheme based on local features) into the image from the peripheral view 
camera, where the weight of the local features v 7 is strongly attenuated. The vector p is 
thus given in wide field of view retinal coordinates. 

A collection of images is automatically annotated by the robot (Arsenio, 2004b;c) and 
used as training data. Mixture models are applied to find interesting places to put a 
bounded number of local kernels that can model large neighborhoods. In D- 
dimensions a mixture model is denoted by density factorization over multivariate 
Gaussians (spherical Gaussians were selected for faster processing times), for each 
object class n: 

M 

p(c\o ) = Yb G(c,u ,C ) 

where G m refers to the m* Gaussian with mean Ji and covariance matrix Cm, M is the 

number of Gaussian clusters, and b m = P(G m ) are the weights of the local models. The 
estimation of the parameters will follow the EM algorithm (Gershenfeld, 1999): 

■ E-step for k-iteration: From the observed data c , compute the a-posteriori cluster 
probability <?*„(/): 

e (/) = p(c \c) = : : ■ \°) 
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M-step for k-iteration: cluster parameters are estimated according to the 
maximization of the join likelihood of the L training data samples 
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Fig. 11 Test images (wide field of view) organized with respect to p(o \ c) ■ Top row: 
o n =scene 1/ p(scene l \ c) > 0.5 ; Bottom row: o n =scene2, p(scene 2 \ c) > 0.5 • Scene descriptions 
shown in the right column are built on-line, automatically (Arsenio, 2004b;c). 

The EM algorithm converges as soon as the cost gradient is small enough or a maximum 
number of iterations is reached. The probability density function (PDF) for an object n is 
then given by Bayes' rule p(p n \ c) = p{c \ o„)p(o n )/ p(o n ) where 

p(c) = p(c | o n )p(o n ) + p(c | -.o„)p(-,o„) • 

The same method applies for the out-of-class PDF p(c I — ,o ) which represents the statistical 
feature distribution for the input data in which o n is not present. 

Finally, it is necessary to select the number M of gaussian clusters. This number can be 
selected as the one that maximizes the join likelihood of the data. An agglomerative 
clustering approach based on the Rissanen Minimum Description Length (MDL) order 
identification criterion (Rissanen, 1983) was implemented to automatically estimate M 
(Figure 11 shows algorithm results for classifying two scenes). 

3.5 Dynamic Programming for Recognition from Cross-Modal Cues 

Different objects have distinct acoustic-visual patterns which are a rich source of 
information for object recognition, if we can recover them. The relationship between object 
motion and the sound generated varies in an object-specific way. A hammer causes sound 
after striking an object. A toy truck causes sound while moving rapidly with wheels 
spinning; it is quiet when changing direction. These statements are truly cross-modal in 
nature. Features extracted from the visual and acoustic segmentations are what is needed to 
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build an object recognition system (Fitzpatrick & Arsenio, 2004). The feature space for 
recognition consists of: 

■ Sound/ Visual period ratios - the sound energy of a hammer peaks once per visual 
period, while the sound energy of a car peaks twice. 

■ Visual/ Sound peak energy ratios - the hammer upon impact creates high peaks of 
sound energy relative to the amplitude of the visual trajectory. 

Dynamic programming is applied to match the sound energy to the visual trajectory signal. 
Formally, let S = (Si, . . . , S n ) and V= (Vi, . . . , V m ) be sequences of sound and visual 
trajectory energies segmented from n and m periods of the sound and visual trajectory 
signals, respectively. Due to noise, n may be different to m. If the estimated sound period is 
half the visual one, then V corresponds to energies segmented with 2m half periods (given 
by the distance between maximum and minimum peaks). A matching path P = (Pi, . . . , Pi) 
defines an alignment between S and M, where max(m, n)<l<m + n-l, and Pk = (i, j), a 
match k between sound cluster] and visual cluster i. The matching constraints are set by: 

■ The boundary conditions: Pi= (1, 1) and Pi= (m, n). 

■ Temporal continuity: Pk+iG {(i+1, j + l)/ (i+T ]), (h j+1)}- Steps are adjacent 
elements of P. 

The function cost Cy is given by the square difference between Vi and Sj periods. The best 
matching path W can be found efficiently using dynamic programming, by incrementally 
building an m x n table caching the optimum cost at each table cell, together with the link 
corresponding to that optimum. The binding W will then result by tracing back through 
these links, as in the Viterbi algorithm. 
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Fig. 12. Object recognition from cross-modal clues. The feature space consists of period and 
peak energy ratios. The confusion matrix for a four-class recognition experiment is shown. 
The period ratio is enough to separate the cluster of the car object from all the others. 
Similarly, the snake rattle is very distinct, since it requires large visual trajectories for 
producing soft sounds. Errors for categorizing a hammer originated exclusively from 
erroneous matches with the cube rattle, because hammering is characterized by high energy 
ratios, and very soft bangs are hard to identify correctly. The cube rattle generates higher 
energy ratios than the snake rattle. False cube recognitions resulted mostly from samples 
with low energy ratios being mistaken for the snake. 
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Experimental Results: Figure 12 shows cross-modal features for a set of four objects. It 
would be hard to cluster automatically such data into groups for classification. But as in the 
sound recognition algorithm, training data is automatically annotated by visual recognition 
and tracking. After training, objects can be categorized from cross-modal cues alone. The 
system was evaluated by selecting randomly 10% of the data for validation, and the 
remaining data for training. This process was randomly repeated 15 times. The recognition 
rate averaged over all these runs were, by object category: 86.7%for the cube rattle, 100% for 
both the car and the snake rattle, and 83% for the hammer. The overall recognition rate was 
92.1%. Such results demonstrate the potential for recognition using cross-modal cues. 

3.6 Weighted Cluster Modeling 

Objects in the world are situated, in the sense that they usually appear in specific places. 
Children are pretty good at learning the relative probability distribution of objects in a scene 
- for instance, chairs are most probable in front of desks, but not in a ceiling. The scene 
context puts a very important constraint on the type of places in which a certain object 
might be found. From a humanoid point of view, contextual selection of the attentional 
focus is very important both to constrain the search space for identifying or locating objects 
(optimizes computational resources) and also to determine common places on a scene to 
drop or store objects. 

Therefore, it is important to develop a model for the contextual control of the attentional 
focus (location and orientation), scale selection and depth inference. The output space is 
defined by the 6-dimensional vector x = (p, d, s, </)) , where p is a 2D position vector, d is 
the object's depth (Arsenio, 2004b;c), s = (w, h) is a vector containing the principal 
components of the ellipse that models the 2D retinal size of the object, and <p is the 

orientation of such ellipse. Given the context c , we need to evaluate the PDF p(x \ o ,c) 
from a mixture of (spherical) Gaussians (Gershenfeld, 1999), 

M 

p(x,c\o) = Yb G(x,n ,X )G(c,u k ,C k ) (8) 

m=l 

The mean of the new Gaussian G(x,fj ,X ) is now a function fj = f(c,0 )/ that 
depends on c and on a set of parameters p m ,n- A locally affine model was chosen for/, with 
\B = (a ,A):n =a +A T c\- The learning equations become now (Gershenfeld, 
1999): 

■ E-step for k-iteration: From the observed data c and X , compute the a-posteriori 
probabilities of the clusters: 

e (l) = —r, 

m,n v / M 

/_, b m ,n G{x, fj mn , X mn )G{c, fi mn ,C mn ) 

:n=\ 

• M-step for k-iteration: cluster parameters are estimated according to (where m 
indexes the M clusters, and 1 indexes the number L of samples): 

C 1= <(^-06-/O r > m ( 9 ) 
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A k z=(c k :x <(c-p)(x-iiy > m (io) 

a k+l =<(x-(A k+x ) T c)> 

m,n v v m,n / / m 



a i+1 =<(x-(^ +, ) r c)> (11) 



=<(x — a — M ) c)(x — a —(A ) c) > (12) 

All vectors are column vectors and <>m in (9) represents the weighted average with respect 

to the posterior probabilities of cluster m. 

The parameters b k ar >d means D k+1 are estimated as before. The conditional probability 

follows then from the joint PDF of the presence of an object On, at the spatial location p, with 
pose (h , size s and depth d, given a set of contextual image measurements c '■ 

p(x\o n ,c) = s 






K G(c,n k ,c k ) 



Object detection and recognition requires the evaluation of this PDF at different locations in 
the parameter space. The mixture of gaussians is used to learn spatial distributions of objects 
from the spatial distribution of frequencies in an image. 

Figure 13 presents results for selection of the attentional focus for objects from the low-level 
cues given by the distribution of frequencies computed by wavelet decomposition. 
Some furniture objects were not moved (such as the sofas), while others were moved in different 
degrees: the chair appeared in several positions during the experiment, while the table and door 
suffered mild displacements. Still, errors on the head gazing control added considerable location 
variability whenever a non-movable object was segmented and annotated. It demonstrates that, 
given an holistic characterization of a scene (by PCA on the image wavelet decomposition 
coefficients), one can estimate the appropriate places whether objects often appear, such as a 
chair in front of a table, even if no chair is visible at the time - which also informs that regions in 
front of tables are good candidates to place a chair. Object occlusions by people are not relevant, 
since local features are neglected, favoring contextual ones. 

3.7 Back-propagation Neural Networks 

Activity Identification 

A feature vector for activity recognition was proposed by (Polana & Nelson, 1994) which 
accounts for 3-dimensional information: 2-dimensional spatial information plus temporal 
information. The feature vector is thus a temporal collection of 2D images. Each of these 
images is the sum of the normal flow magnitude (computed using a differential method) - 
discarding information concerning flow direction - over local patches, so that the final 
resolution is of 4x4. The normal flow accounts only for periodically moving pixels. 
Classification is then performed by a nearest centroid algorithm. 

Our strategy reduces the dimensionality of the feature vector to 2-dimensional. This is done 
by constructing a 2D image which contains a description of an activity. Normalized length 
trajectories over one period of the motion are mapped to an image, in which the horizontal 
axis is given by the temporal scale, and the vertical axis by 6 elements describing position 
and 6 elements for velocities. The idea is to map trajectories into images. This is 
fundamentally different to the trajectory primal-sketch approach suggested in (Gould & 
Shah, 1989), which argues for compact representations involving motion discontinuities. We 
opt instead for using redundant information. 
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Fig. 13. Localizing and recognizing objects from contextual cues, (top) Samples of scene 
images are shown on the first column. The next five columns show probable locations based 
on context for finding a door, the smaller sofa, the bigger sofa, the table and the chair, 
respectively. Even if the object is not visible or present, the system estimates the places at 
which there is a high probability of finding such object. Two such examples are shown for 
the chair. Occlusion by humans do not change significantly the context, (bottom) Results in 
another day, with different lightning conditions. 

Activities, identified as categories which include objects capable of similar motions, and the 
object's function in one activity, can then be learned by classifying 12 x 12 image patterns. 
One possibility would be the use of eigenobjects for classification (as described in this 
chapter for face and sound recognition). Eigenactivities would then be the corresponding 
eigenvectors. We opted instead for neural networks as the learning mechanism to recognize 
activities. 

Target desired values, which are provided by the multiple object tracking algorithm, are 
used for the annotation of the training samples - all the training data is automatically 
generated and annotated, instead of the standard manual, offline annotation. An input 
feature vector is recognized into a category if the corresponding category output is higher 
than 0.5 (corresponding to a probability p > 0;5). Whenever this criterion fails for all 
categories, no match is assigned to the activity feature vector - since the activity is estimated 
as not yet in the database, it is labeled as a new activity. 

We will consider the role of several objects in experiments taken for six different activities. 
Five of these activities involve periodic motion: cleaning the ground with a swiping 
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brush; hammering a nail-like object with a hammer; sawing a piece of metal; moving a 
van toy; and playing with a swinging fish. Since more information is generated from 
periodic activities, they are used to generate both training and testing data. The remaining 
activity, poking a lego, is detected from the lego's discontinuous motion after poking. 
Figure 14 shows trajectories extracted for the positions of four objects from their 
sequences of images. 

A three layer neural network is first randomly initialized. The input layer has 144 
perceptron units (one for each input), the hidden layer has six units and the output layer has 
one perception unit per category to be trained (hence, five output units). Experiments are 
run with a set of (15, 12, 15, 6, 3, 1) feature vectors (the elements of the normalized activity 
images) for the swiping brush, hammer, saw, van toy, swinging fish and lego, respectively. 
A first group of experiments consists of selecting randomly 30% of these vectors as 
validation data, and the remaining as training data. The procedure is repeated six times, so 
that different sets of validation data are considered. The other two groups of experiments 
repeat this process for the random selection of 20% and 5% of feature vectors as validation 
data. The correspondent quantitative results are presented in figure 15. 




-flu - w - 



Fig. 14. a) Signals corresponding to one period segments of the object's trajectories 
normalized to temporal lengths of 12 points. From top to bottom: image sequence for a 
swiping brush, a hammer, a van toy and a swinging fish, b) Normalized centroid positions 
are shown in the left column, while the right column shows the (normalized and scaled) 
elements of the affine matrix R' (where the indexes represents the position of the element on 
this matrix). 
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The lego activity, not represented in the training set, was correctly assigned as a new activity for 
67% of the cases. The swinging fish was correctly recognized for just 17% of the cases, being the 
percentage of no matches equal to 57%. We believe that this poor result was due to the lack of a 
representative training set - this assumption is corroborated by the large number of times that 
the activity of swinging a fish was recognized as a new activity. The swiping brush was wrongly 
recognized for 3,7% of the total number of trials. The false recognitions occurred for experiments 
corresponding to 30% of the validation data. No recognition error was reported for smaller 
validation sets. All the other activities were correctly recognized for all the trials. 









J-l r 

i t r"i , 1 


-, 
i i 


. rh 



nc*n*ci 



TStmait) 
dovwnian 



MB* flSli 

Niimbsr at training points 



Cnnfuwn latik 

<l>chM1tlgSM 


^ — 


3d IT 


Van lay 


fish 


Noi maimed 


Sniping Bwti 


0.96 


0.037 


Hanwwr 


1 





6 








S«* 


V 


1 


G 





P 


Van :oy 





a 


1 


o 





BBfl 


o.ose o 


D 


o K 


0.17 


57 


lego 


D 


0.056 


056 


Oii 


u„t? 



Fig. 15. Experimental results for activity recognition (and the associated recognition of object 
function). Each experiment was ran six times for random initial conditions. Top graph) from 
left to right columns: 30%, 20% and 5% of the total set of 516 feature vectors are used as 
validation data. The total number of training and validation points, for each of the six trials 
(and for each of the 3 groups of experiments), is (15, 12, 15, 6, 3, 1) for the swiping brush, 
hammer, saw, van toy, swinging fish and lego, respectively. The three groups of columns 
show recognition, error and missed-match rates (as ratios over the total number of 
validation features). The bar on top of each column shows the standard deviation. Bottom 
table: Recognition results (as ratios over the total number of validation features). Row i and 
column j in the table show the rate at which object i was matched to object j (or to known, if j 
is the last column). Bold numbers indicate rates of correct recognitions. 

Sound Recognition 

An artificial neural network is applied off-line to the same data collected as before for sound 
recognition. The 32 x 32 sound images correspond to input vectors of dimension 1024. 
Hence, the neural network input layer contains 1024 perceptron units. The number of units 
in the hidden layer was set to six, while the output layer has four units corresponding to the 
four categories to be classified. The system is evaluated quantitatively by randomly selecting 
40%, 30% and 5% of the segmented data for validation, and the remaining data for training. 
This process was randomly repeated six times. This approach achieves higher recognition 
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rates when compared to eigensounds. The overall recognition rate is 96,5%, corresponding 
to a significant improvement in performance. 

3.8 Other Learning Techniques 

Other learning techniques exploited by Cog's cognitive system includes nearest-neighbor, 
locally linear receptive-field networks, and Markov models. 

Locally Linear Receptive-field Networks 

Controlling a robotic manipulator on the cartesian 3D space (eg. to reach out for objects) 
requires learning its kinematics - the mapping from joint space to cartesian space - as well 
as the inverse kinematics mapping. This is done through locally weighted regression and 
Receptive-field weighted regression, as proposed by (Schaal et al., 2000). This 
implementation on the humanoid robot Cog is described in detail by (Arsenio 2004c;d). 

Markov Chains 

Task descriptions can be modeled through a finite Markov Decision Process (MDP), defined 
by five sets <S; A; P;R;0 >. Actions correspond to discrete, stochastic state- transitions 
ae A={Periodicity, Contact, Release, Assembling, Invariant Set, Stationarity} from an 
environment's state s r e S to the next state s,+l, with probability P" e P , where P is a set of 

transition probabilities P", = P \s. = s' \ s, a}- 

Task learning consists therefore on determining the states that characterize a task and mapping 
such states with probabilities of taking each possible action (Arsenio, 2003; Arsenio, 2004d). 

4. Cognitive development of a Humanoid Robot 

The work here described is part of a complex cognitive architecture developed for the 
humanoid robot Cog (Arsenio, 2004d), as shown in Figure 16. This chapter focused on a 
very important piece of this larger framework implemented on the robot. The overall 
framework places a special emphasis on incremental learning. A human tutor performs 
actions over objects while the robot learns from demonstration the underlying object 
structure as well as the actions' goals. This leads us to the object/scene recognition problem. 
Knowledge concerning an object is organized according to multiple sensorial percepts. After 
object shapes are learned, such knowledge enables learning of hand gestures. Objects are 
also categorized according to their functional role (if any) and their situatedness in the 
world. Learning per si is of diminished value without mechanisms to apply the learned 
knowledge. Hence, robot tasking deals with mapping learned knowledge to perceived 
information, for the robot to act on objects, using control frameworks such as neural 
oscillators and sliding-motion control (Arsenio, 2004). 

Teaching a humanoid robot information concerning its surrounding world is a difficult task, 
which takes several years for a child, equipped with evolutionary mechanisms stored in its 
genes, to accomplish. 

Learning aids such as books or educational, playful activities that stimulate a child's brain are 
important tools that caregivers extensively apply to communicate with children and to boost 
their cognitive development. And they also are important for human-robot interactions. 
If in the future humanoid robots are to behave like humans, a promising venue to achieve 
this goal is by treating then as such, and initially as children - towards the goal of creating a 
2-year-old-infant-like artificial creature. 
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Fig. 16. Overview of the cognitive architecture developed for the humanoid robot Cog. 

5. Conclusions 

We proposed in this chapter the application of a collection of learning algorithms to solve a 
broad scope of problems. Several learning tools, such as Weighted-cluster modeling, 
Artificial Neural Networks, Nearest Neighbor, Hybrid Markov Chains, Geometric Hashing, 
Receptive Field Linear Networks and Principal Component Analysis, were extensively 
applied to acquire categorical information about actions, scenes, objects and people. 
This is a new complex approach to object recognition. Objects might have various meanings 
in different contexts - a rod is labeled as a pendulum if oscillating with a fixed endpoint. 
From a visual image, a large piece of fabric on the floor is most often labeled as a tapestry, 
while it is most likely a bed sheet if it is found on a bed. But if a person is able to feel the 
fabric's material or texture, or the sound that it makes (or not) when grasped with other 
materials, then (s)he might determine easily the fabric's true function. Object recognition 
draws on many sensory modalities and the object's behavior, which inspired our approach. 
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1. Introduction 

Realization of natural and energy-efficient dynamic walking has come to be one of the 
main subjects in the research area of robotic biped locomotion. Recently, many 
approaches considering the efficiency of gait have been proposed and McGeer's passive 
dynamic walking (McGeer, 1990) has been attracted as a clue to elucidate the mechanism 
of efficient dynamic walking. Passive dynamic walkers can walk down a gentle slope 
without any external actuation. Although the robot's mechanical energy is dissipated by 
heel-strike at the stance-leg exchange instant, the gravity potential automatically restores 
it during the single-support phase in the case of passive dynamic walking on a slope and 
thus the dynamic walking is continued. If we regard the passive dynamic walking as an 
active one on a level, it is found that the robot is propelled by the small gravity in the 
walking direction and the mechanical energy is monotonically restored by the virtual 
control inputs representing the small gravity effect. Restoration of the mechanical energy 
dissipated by heel-strike is a necessary condition common to dynamic gait generations 
from the mathematical point of view, and efficient active dynamic walking should be 
realized by reproducing this mechanism on a level. Mechanical systems satisfy a relation 
between the control inputs and the mechanical energy, the power-input for the system is 
equal to the time-derivative of mechanical energy, and we introduce a constraint 
condition so that the time-change rate of mechanical energy is kept positive constant. 
The dynamic gait generation is then specified by a simple redundant equation including 
the control inputs as the indeterminate variables and yields a problem of how to solve 
the equation in real-time. The ankle and the hip joint torques are determined according 
to the phases of cycle based on the pre-planned priority. The zero moment point 
(Vukobuatovic & Stepanenko, 1972) can be easily manipulated by adjusting the ankle- 
joint torque, and the hip-joint torque in this case is secondly determined to satisfy the 
desired energy constraint condition with the pre-determined ankle-joint torque. Several 
solutions considering the zero moment point condition are proposed, and it is shown 
that a stable dynamic gait is easily generated without using any pre-designed desired 
trajectories. The typical gait is analyzed by numerical simulations, and an experimental 
case study using a simple machine is performed to show the validity of the proposed 
method. 
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2. Compass-like Biped Robot 

In this chapter, a simplest planar 2-link full-actuated walking model, so-called compass-like 
walker (Goswami el ah, 1996), is chosen as the control object. Fig. 1 (left) shows the 
experimental walking machine and closed up of its foot which was designed as a nearly 
ideal compass-like biped model. This robot has three DC motors with encoders in the hip 
block to reduce the weight of the legs. The ankle joints are driven by the motors via timing 
belts. Table lists the values of the robot parameters. Fig. 1 (right) shows the simplest ideal 

compass-like biped model of the experimental machine, where TYl H , Hi [kg] and / = a + O 

[m] are the hip mass, leg mass and leg length, respectively. Its dynamic equation during the 
single-support phase is given by 

M(O)d + C(O,d)O + g(0) = T, (1) 

where = \0 1 T is the angle vector of the robot's configuration, and the details of the 
matrices are as follows: 

m H l 2 +ma 2 +ml 2 -mbl cos [0 l —6 2 \ 

-mWcos((? 1 ~0 2 ) m b 2 



M(0)-- 



C(0,9)-- 



g{0)- 




mbl sin (0,-0 2 )0, 
— (m H l + ma + m/)s 



-mblsm(0 l -0 2 )0 2 




(2) 



mb sin 9 2 
and the control torque input vector has the form of 



T=SU = 







(3) 



The transition is assumed to be inelastic and without slipping. With the assumption and 
based on the law of conservation of angular momentum, we can derive the following 
compact equation between the pre-impact and post-impact angular velocities 

Q + (a)t=Q-(a)0-, ' (4) 



where 



e + («)= 


m H l 2 +ma 2 + mill -bcos(2a)j mb(b-lcos(2a 
-wWcos(2a) mb 2 


e-(«)= 


[m H l 2 + 2mal ) cos (2 a) -mab -mab 
-mab 


' 


and CC [rad] is the half inter-leg angle at the heel-strike instant given by 




0, —0 2 2 -0, 
a = — — = — ->0- 





(5) 



(6) 
2 2 

For further details of derivation, the authors should refer to the technical report by Goswami 
el al. This simplest walking model can walk down a gentle slope with suitable choices of 
physical parameters and initial condition. Goswami el al. discovered that this model exhibits 
period-doubling bifurcations and chaotic motion (Goswami el al., 1996) when the slope 
angle increases. The nonlinear dynamics of passive walkers are very attractive but its 
mechanism has not been clarified yet. 
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Fig. 1. Experimental walking machine and its foot mechanism (left), and its ideal model (right). 
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Table 1. Physical parameters of the experimental machine. 



3. Passive Dynamic Walking Mechanism Revisited 

Passive dynamic walking has been considered as a clue to elucidate to clarify the essence of 
efficient dynamic walking, and the authors believe that it is worth investigating the 
automatic gait generation mechanism. The impulsive transition feature, non double-support 
phase, can be intuitively regarded as vigor for high-speed and energy-efficient walking. In 
order to get the vigor, the walking machine must restore the mechanical energy efficiently 
during the single-support phase, and the impulsive and inelastic collision with the ground 
dissipates it discontinuously. In the following, we describe it in detail. 

The passive dynamic walker on a gentle slope can be considered to walk actively on a virtual 
level ground whose gravity is gcos(zi as shown in Fig. 2. The left robot in the figure is 
propelled forward by the small gravity element of gsin0, and the right one walks by 
equivalent transformed torques. By representing this mechanism in the level walking, 
energy-efficient dynamic bipedal gait should be generated. The authors proposed virtual 
gravity concept for the level walking and called it "virtual passive dynamic walking." 
(Asano & Yamakita, 2001) The equivalent torques u and u are given by transforming the 

effect of the horizontal gravity element g sin <f> as shown in Fig. 2 left. 
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Let us define virtual total mechanical energy £ under the gravity condition of Fig. 2 as 
follows: 



(0,0J)_\0 M{0)0 P(0,(, 



(7) 



where the virtual potential energy is given by 

P{0,(f) = {( m H l + ma + ml) cos (6^ -<fi)-mbcos(0 2 -#))},§■ cos (# • (°) 

In the case of passive dynamic walking on a slope, the total mechanical energy is kept 
constant during the single-support phase, whereas ,» does not exhibit such behaviour. Fig. 
3 shows the simulation results of passive dynamic walking on a gentle slope whose 
magnitude is 0.01 [rad]. (c) and (d) show the evolutions of the equivalent transformed 
torques and virtual energy £ , respectively. From (c), we can see that both u 1 and u i are 

almost constant-like and thus the ZMP should be kept within a narrow range. This property 
is effective in the virtual passive dynamic walking from the viewpoint of the stability of foot 
posture (Asano & Yamakita, 2001). It is apparent from (d) that the mechanical energy is 
dissipated at the transition instant and monotonically restored during the swing phase. Such 
energy behaviour can be considered as an indicator of efficient dynamic walking. 



g COS <j) 




Fig. 2. Gravity acceleration mechanism of passive dynamic walking. 

In general, we can state the following. 

CHI) The total mechanical energy of the robot ^ increases monotonically during the swing 

phase. 

CH2) > always holds. 

CH3) There exists an instant when 0-0 = • 

CHI and CH2 always hold, regardless of physical and initial conditions, but CH3 does not 
always hold, as it depends on physical parameters and slope angle. We can confirm CH2 
and CH3 from Fig. 3 (a) and (b). It is also clear that CHI holds from Fig. 3 (d). From the 
results, the essence of a passive dynamic gait should be summarized as follows. 
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El) The walking pattern is generated automatically, including impulsive transitions, and 

converges to a steady limit cycle. 

E2) The total mechanical energy is restored during the single-support phase monotonically, 

and is dissipated at every transition instant impulsively by heel-strike with the floor. 

E2 is considered to be an important characteristic for dynamic gait generation, and is the 

basic concept of our method. We will propose a simple method imitating the property in the 

next section. 
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Fig. 3. Simulation results of passive dynamic walking on a slope where (j> = 0.01 [rad]. 
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4. Energy Constraint Control 

In our previous works, we have proposed virtual passive walking considering an artificial 
gravity condition called virtual gravity (Asano & Yamakita, 2001). This imitates the gravity 
acceleration mechanism of the original passive dynamic walking. A virtual gravity in the 
walking direction acts as a driving force for the robot and the stable limit cycle can be 
generated automatically without any gait design in advance. Determining a virtual gravity 
is, however, equivalent to that of control inputs, so there is no freedom to control other 
factors, for example, ZMP control. By imitating the property of monotonic energy 
restoration, however, we can formulate a simple method with a freedom of the control 
inputs. 

4.1 The Control Law 

The total mechanical energy of the robot can be expressed as 

E{e,e) = -e T M(0)0+P{0)' ( 9 ) 

where P is the potential energy. The power input to the system is the time-change rate of 
the total energy, that is 

E = e T T = e T Su. (io) 

Suppose now that we use a simple control law imitating the characteristic CHI, monotonic 
energy restoration. Let X > be a positive constant and consider the following condition: 

E = X. (11) 

This means that the robot's mechanical energy increases monotonically with a constant rate 
of X ■ We call this control or gait generation method "Energy Constraint Control (ECC)". In 
this method, the walking speed becomes faster w.r.t. the increase of X , in other words, the 
magnitude of X corresponds to the slope angle of virtual passive dynamic walking. Here let 
us consider the following output function: 

H(t) = E-X = t t-X, (12) 

and the target constraint condition of Eq. (11) can be rewritten as H(t) = 0- Therefore, the 

ECC can be regarded in this sense as an output zeroing control. 

Following Eqs. (10) and (11), the detailed target energy constraint condition is expressed as 

E = J Su = 1 u 1 +(0 1 -0 2 )u 2 =A> (13) 

which is a redundant equation on the control inputs. The dynamic gait generation then 
yields a problem of how to solve the redundant equation for the control inputs u \ and u i in 
real-time. The property of ECC strategy is that the control inputs can be easily determined 
by adjusting the feed-forward parameter A , which can be determined by considering the 
magnitude of E of virtual passive dynamic walking. 

4.2 Relation between ZMP and Reaction Moment 

The actual walking machine has feet and a problem of reaction moment then arises. The 
geometrical specifications of the stance leg and its foot are shown in Fig. 4. In this chapter, 
the ZMP is calculated by the following approach. We assume: 

1. The mass and volume of the feet can be ignored. 

2. The sole always fits with the floor. 
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Under these assumptions, we can calculate the ZMP in the coordinate shown in Fig. 4 left as: 

ZMP = -^ (14) 

R n 

where U l [Nm] is the ankle torque acting not on the foot link but on the leg link and R„ [N] 
is the vertical element of the reaction force, respectively. 

From Fig. 4, it is obvious that the ZMP is always shifted behind the ankle joint when driving 
the stance-leg forward, however, at the transition instant, the robot is critically affected by 
the reaction moment from the floor as shown in Fig. 4 right. Considering the reaction 
moment effect, we can reform the ZMP equation for the simplest model as follows: 

ZM P = -^^ ( 15 ) 

where u > represents the equivalent torque of the reaction moment, and the ZMP is 

shifted backward furthermore. u Tm acts as a disturbance for the transition. Since the actual 
walking machines generally have feet with toe and heel, this problem arises. From the 
aforementioned point of view, we conclude that the ZMP should be shifted forward the 
ankle-joint just after the transition instant to cancel the reaction moment. Based on the 
observation, in the following, we consider an intuitive ZMP manipulation algorithm 
utilizing the freedom of the redundant equation of (13). 




Walking direction 




TUP ZMP 

Fig. 4. Foot mechanism and reaction moment at the heel-strike instant. 

4.3 Principal Ankle-joint Torque Control 

From a practical point of view, as mentioned above, the two most important control factors 
of dynamic bipedal walking are mechanical energy restoration and ZMP control. To keep 
the energy constraint condition of Eq. (11), we should reconsider the solution algorithm. 
Firstly, we should consider mechanical energy restoration to generate a gait, and secondly, 
ZMP condition must be guaranteed without destroying the constraint condition. Based on 
the considerations, we first discuss the following solution approach: 

1. Determine the value of A . 

2. Determine the ankle torque U l . 

3. By substituting A and U t into Eq. (13), we can solve it for U 2 ■ 

In order to shift the ZMP, let us consider the following simple ankle-joint torque control: 
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ftT<0 s<T (16) 

\u + > otherwise 
where S is a virtual time that is reset at every transition instant and s = 1 .0 • This comes 
from the fact that U { must be negative to shift the ZMP forward the ankle-joint, and if U l > 

the ZMP moves behind the ankle-joint. In this case, U 2 is obtained after U l as follows: 

U2 = A r^ u± - ( 17 ) 

l -Q 1 

Note that U 2 has a singularity at -Q = which was mentioned before as CH3. This 

condition must be taken into account. We then propose a switching control law described 

later. Before it, we consider a more reasonable switching algorithm from U to U . In 
general, for most part of a cycle from the beginning, the condition —0 >0 holds (See Fig. 

3 (b)), and thus the sign of U 2 of Eq. (17) is identical with that of X - 0,u, ■ If u = u~ / this sign 
is positive because of X,6 X >0 and u<0. At the beginning of a cycle, X-6{U + increases 
monotonically because of < (See Fig. 3 (b)). Therefore in general the condition 

±U-0y) = -0y>o ( 18 ) 

holds regardless of the system parameter choice. Therefore, if X < u + at the beginning, it is 
reasonable to switch when 

A-0 { u + =O (19) 

so as to keep U 2 of Eq. (17) always positive under the condition of Q —0 >0- In addition, 
by this approach the hip-joint torque can always contribute the mechanical energy 
restoration. The switching algorithm of i/j is summarized as follows: 

\u~ <0 A< 0\u* (20) 



\u* > otherwise 
The value of W must be determined empirically based on the simulation results of virtual 

passive dynamic walking, whereas U should be determined carefully so as not to destroy 
the original limit cycle or disturb the forward acceleration. Choosing the suitable 

combination between A and U is the most important for generating a stable limit cycle. 

4.4 Principal Hip-joint Torque Control 

As mentioned before, we must switch the controller to avoid the singularity of CH3 at the 
end of the single-support phase. As a new method, we propose the following new strategy: 

1. Determine the value of A . 

2. Determine the hip torque U 2 . 

3. By substituting A and U 2 into Eq. (13), we can solve it for U l . 
In this case, U l is determined by the following formula: 
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x-[e l -0 1 )u 1 



(21) 



Note that here we use the assumption of CH2. In this paper, as a reasonable candidate of U 2 , 
we consider the following form: 

u 2 =r]{e,-0 2 )- ( 22 ) 

Assuming rj>0, this leads the following inequality: 

(4-4)« 2 =) 7(<9i-^) 2 ^0/ (23) 

therefore it is found that this hip-joint torque U 2 also contributes the mechanical energy 
restoration. 
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Walking direction 





Reaction moment 
ZMP ^^^^^^^^^™ ZMP 

Fig. 5. Simulation results of dynamic walking by ECC considering ZMP control. 

4.5 Switching Control 

In order to manipulate the ZMP as well as to avoid the singularity, we must consider a 
switching algorithm from the principal ankle to hip-joint torque control. We here introduce 

the switching timing as 0,=y/ [rad]. At this instant, we reset tj so that U 2 becomes 

continuous according to the following relationship: 

(24) 



■■T)(t 



from which we can calculate Tj as follows: 



')' 



A- 



■) /}SW + 

A-u, u 



(25) 



(<?r-4 sw ) 

where the superscript "sw" stands for the switching instant. The obtained Tj is used during 

its cycle and reset at every switching instant. Since U 2 is continuous, Wj also becomes 

continuous. 

Fig. 5 shows the simulation results of the dynamic walking by ECC with the proposed 
switching control. The control parameters are chosen as A = 0.07 [J/s], u + = 0.15, u~ = -0.05 
[Nm] and y/ = 0.05 [rad], respectively. By the effect of the principal ankle -joint torque 
control, the ZMP is shifted forward the ankle-joint without destroying the energy 
restoration condition. From Fig. 5 (b), we can see that the hip-joint torque becomes very 
large during the ZMP is shifted forward, but this does not affect the ZMP condition and the 
postural stability of foot is maintained. 

4.6 Discussion 

Here, we compare our method with approach proposed by Goswami et ah "energy tracking 
control." (Goswami et ah, 1997) Their approach is formulated as 

E = -X m {E-E")> ( 26 ) 

where E* [J] (constant) is the reference energy and positive scalar A is the feedback gain. 
A solution of Eq. (13) by constant torque ratio ^ > which gives the condition u = uu is 
obtained as 
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-K 



\E-E-) 



-1 



(/' + 1)4-4 
and in our case, a solution by constant torque ratio is given by 



A 



/< + l 
-1 



(27) 



(28) 



(// + !)<?, -ft 

Figs. 6 and 7 respectively show the simulation results of active dynamic walking on a level 
by the torques of Eqs. (27) and (28) without manipulating the ZMP actively. The two cases 
are equal in walking speed. From the simulation results, we can see that, in our approach, 
the maximum ankle-joint torque is about 3 times smaller than that of Goswami's approach 
and this yields better ZMP condition. In this sense, we should conclude that the mechanical 
energy must be restored efficiently but its time-change rate should be carefully chosen to 
guarantee the ZMP condition. 
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Fig. 6. Simulation results of dynamic walking by energy tracking control where X 
^ = 10.0 and E" =22.0 [J]. 
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Fig. 7. Simulation results of dynamic walking by ECC where yL = 0.34 [J/s] and ^ = 10.0- 



5. Experiments 

In order to confirm the validity of the proposed method, we carried out actual walking 
experiment using our developed machine introduced in Fig.l. All encoders of the 
servomotors are interfaced to a computer (Pentium III 1.0 GHz) running Windows 98. To 
implement the control law, we used RtMaTX (Koga, 2000) for real-time computation with 
the sampling period 1.0 [ms]. 

Since the proposed methods are so called model-matching control, they are not robust for 
uncertainty. In this research, we use model following control of the motion generated by 
VIM (Virtual Internal Model) which is a reference model in computer. Every post-impact 
condition of VIM is reset to that of the actual machine. By using the VIM, the uncertainties 
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of identification, which is crucial factor in the case of model matching control, can be 
compensated. The dynamics of VIM is given by 

M(O i )O i +C(Oj i )d i +g(O i ) = T i , (29) 

where T d is the control input to drive the VIM and is determined by ff and Q . The control 
input for the actual robot is given by 

T=M(e d )u+C(e i> e i )0 d +§(e i ) (30) 

u = 0\+K D {o\-0) + K p (O i -0) + K l \(O i -0)&t 
The virtual internal model started walking from the following initial condition: 



0(0) = 



0.68 
0.62 



,0(0): 



-0.14 
0.14 



and its state was calculated and updated in real-time. At every transition instant, the 
angular positions of VIM were reset to that of the actual machine. PID controller drives the 
ankle-joint of the swing leg during the single-support phase so that the foot keeps the 
posture horizontal. 

The experimental results are shown in Fig. 8. The adjustment parameters are chosen as 
1 = 0.075 [J/s], u + =0.15, a" =-0.05 [Nm] and ^ = 0.05 [rad] empirically. Fig. 8 (a) and (b) 
show the evolution of angular positions and velocities of the actual machine, respectively. 
The actual angular velocities are calculated by differentiation thorough a filter whose 
transfer function is 70/(.s + 70)- A stable dynamic walking is experimentally realized based 

on ECC via model following control. 



(a) angular positions [rad] 




b) angular velocities [rad/s] 




Fig. 8. Experimental results of dynamic walking by ECC. 
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6. Improving Robust Stability by Energy Feedback Control 

Eq. (26) implies that the walking system becomes robust through the reference energy 
tracking. In other words, this control expands the basin of attraction of a limit cycle, 
however, our method Eq. (13) is so called the feed-forward control, which gives only energy 
change ratio without any information to attract the trajectories. Based on the observations, in 
this section, we firstly analyze the stability of the walking cycle and then consider an energy 
feedback control law in order to increase the robustness of the walking system. 
Let us then consider an energy feedback control using a reference energy trajectory. 
Consider the following control 

E = T Su = E i -C(E-E i ), (31) 

which determines the control input so that the closed energy system yields 

±(E-E,) = -C(E-E d ) (32) 

where Q > is the feedback gain. The original energy constraint control can be recognized 
as the case of E = X and Q = in Eq. (31). By integrating Eq. (11) w.r.t. time, we can obtain 
the reference energy £ using virtual time S as 

E i (s) = E +As (33) 

where E n [J] is the energy value when s = [s]. A solution of Eq. (31) using constant torque 
ratio n yields 



Su _K-c{E- Ei ) 



-1 



(34) 



Although autonomy of the walking system is destroyed by this method, we can improve the 
robustness of the walking system. 

One way to examine the gait stability is Poincare return map from a heel-strike collision to 
the next one. The Poincare return map is denoted below as F : 



-F(x t ) (35) 



where the discrete state X k is chosen as 



em- em 

em 

em 



(36) 



that is, relative hip joint angle and angular velocities just after fc-th impact. The function F 
is determined based on Eqs. (1) and (3), but cannot be expressed analytically. Therefore, we 
must compute F by numerical simulation following an approximation algorithm. 
In the case of steady walking, the relation p (x'\ = x" holds and x" is the equilibrium point 

of state at just after transition instant. For a small perturbation § x around the limit cycle, 
the mapping function F can be expressed in terms of Taylor series expansion as 

F(x k ) = F(x' + Sx t )*x'+VF-Sx t (37) 

where 
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VFA 



dF(x) \ 

dx I 



(38) 



is the Jacobian (gradient) around X . By performing numerical simulations, VF can be 
calculated approximately. The all eigenvalues of VF are in the unit circle and the results are 
omitted. Although the robustness of the walking system is difficult to evaluate 
mathematically, the maximum singular value of VF should imply the convergence speed of 
gait; smaller the value is, faster the convergence to the steady gait is. Fig. 9 shows the 
analysis result of maximum singular value of VF w.r.t /" in the Fig. 7 case with energy 

feedback control where E =21.8575 [J] and ^ = 10.0 ■ The maximum singular value 
monotonically decreases with the increase of r . The effect of improvement of the gait 
robustness by feedback control can be confirmed. Although applying this method destroys 
autonomy of the walking system, we can improve the robustness. 



c 
E 



E 




Fig. 9. Maximum singular value of VF w.r.t. the feedback gain /" . 



6. Extension to a Kneed Biped 

This section considers an extension of ECC to a kneed biped model. We treat a simple planar 
kneed biped model shown in Fig. 10, and its dynamic equation is given by 



M(0)0 + C(0, 0)0 + g(0) = Su -■ 

We consider the following assumptions. 

1, The knee-joint is passive. 

2. It can be mechanically locked-on and off. 



[1 


1] 







-1 


«I 








w 



(39) 
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Fig. 10. Model of a planar underactuated biped robot. 

The ECC then yields a problem of how to solve the following redundant equation: 

E = l u i +[e i -0 2 )u 1 =X (40) 

for the control inputs in real-time. Since the knee-joint is free, we can give the control input 
by applying the form of Eq. (28) as 



Su 



?, 



(/i+i)4-«, 

o 



-l 



(40) 



On the other hand, a kneed biped has a property of obstacle avoidance, in other words, 
guaranteeing the foot clearance by knee-bending. To improve the advantageous, we 
introduce an active knee-lock algorithm proposed in our previous work (Asano & Yamakita, 
2001) in the following. The passive knee-strike occurs when 0=0 during the single- 
support phase, and its inelastic collision model is given by 

M(0)0 + = M(0)0- -J] A, (41) 
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where/ = [o \ _jl T and X 1 is the Lagrange's indeterminate multiplier vector and means 

the impact force. We introduce an active knee-lock algorithm before the impact and 
mechanically lock the knee-joint at a suitable timing. Let us then consider the dissipated 

mechanical energy at this instant. Define the dissipated energy Ai? ks as 

(42) 



AE h A-(0 + ) T M(0)0 + --(0-) T M(0)0- < 



This can be rearranged by solving Eq. (41) as 

1 



AE ks =--(0-) jj(j,M- i jjy Jfi- 



{&2 -^f . 



(43) 



: , and this leads 



2 v ' ■ v ' ' ' ' 2j,m-'j; 

This shows that the condition to minimize the energy dissipation is ff' 
AR , = • In general, there exists the timing in the kneed gait. After locking-on the knee- 
joint, we should lock-off it and the timing should be chosen empirically following a certain 
trigger. In this section, we consider the trigger as X = [m] where x ls the- X-position of 

the robot's center of mass. Fig. 11 shows the phase sequence of a cycle with the knee-lock 
algorithm, which consists of the following phases. 

1. Start 

2. 3-link phase I 

3. Active knee-lock on 

4. Virtual compass phase (2-link mode) 

5. Active knee-lock off 

6. 3-link phase II 

7. Passive knee-strike 

8. Compass phase (2-link mode) 

9. Heel-strike 

Fig. 12 shows the simulation results of dynamic walking by ECC where A = 5.0 and 
ju = 4.0 • The physical parameters are chosen as Table 2. From Fig. 12 (b) and (d), it is 
confirmed that the passive knee-joint is suitably locked-on without energy-loss, and after 
that, active lock-off and passive knee-strike occur. Fig. 13 shows the stick diagram for one 
step. We can see that a stable dynamic bipedal gait is generated by ECC. 




Fig. 11. Phase sequence of dynamic walking by ECC with active lock of free knee-joint. 
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Fig. 12. Simulation results of dynamic walking of a kneed biped by ECC where A 
[J/s] and /J = 4.0 . 
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Table 2. Parameters of the planar kneed biped. 




Fig. 16. Stick diagram of dynamic walking with free knee-joint by ECC 
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7. Conclusions and Future Work 

In this chapter, we have proposed a simple dynamic gait generation method imitating the 
property of passive dynamic walking. The control design technique used in this study was 
shown to be effective to generate a stable dynamic gait, and numerical simulations and 
experiments have proved its validity. 

The authors believe that an energy restoration is the most essential necessary condition of 
dynamic walking and its concept is worth to be taken into consideration to generate a 
natural and energy-efficient gait. In the future, extensions of our method to high-dof 
humanoid robots should be investigated. 
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1. Introduction 



Recently, numerous collaborations have been focused on biped robot walking pattern to trace 
the desired paths and perform the required tasks. In the current chapter, it has been focused 
on mathematical simulation of a seven link biped robot for two kinds of zero moment points 
(ZMP) including the Fixed and the Moving ZMP. In this method after determination of the 
breakpoints of the robot and with the aid of fitting a polynomial over the breakpoints, the 
trajectory paths of the robot will be generated and calculated. After calculation of the trajectory 
paths of the robot, the kinematic and dynamic parameters of the robot in Matlab environment 
and with respect to powerful mathematical functions of Matlab, will be obtained. The 
simulation process of the robot is included in the control process of the system. The control 
process contains Adaptive Method for known systems. The detailed relations and definitions 
can be found in the authors' published article [Musavi and Bagheri, 2007]. The simulation 
process will help to analyze the effects of drastic parameters of the robot over stability and 
optimum generation of the joint's driver actuator torques. 

2. Kinematic of the robot 

The kinematic of a seven link biped robot needs generation of trajectory paths of the robot 
with respect to certain times and locations in relevant with the assumed fixed coordinate 
system. In similarity of human and robot walking pattern, the process of path trajectory 
generation refers to determination of gait breakpoints. The breakpoints are determined and 
calculated with respect to system identity and conditions. 

Afterward and in order to obtain comprehensive concept of the robot walking process, the 
following parameters and definitions will be used into the simulation process: 

- Single Support phase: The robot is supported by one leg and the other is suspended in air 

- Double support phase: The robot is supported by the both of its legs and the legs are in 
contact with the ground simultaneously 

: Total traveling time including single and double support phase times, f - 

T '■ Double support phase time which is regarded as 20% of r - 
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-T '■ The time which ankle joint has reached to its maximum height during walking cycle. 

: Step number k - 

Ankle joint maximum height: ff - 

- L '■ The horizontal traveled distance between ankle joint and start point when the ankle 

joint has reached to its maximum height. 
Step length: £> - 

: Foot lift angle and contact angle with the level ground a a - 

- A : Surface slope 

- h : Stair level height- 

- H . : Foot maximum height from stair level 

- X ed : The horizontal distance between hip joint and the support foot (Fixed coordinate 
system) at the start of double support phase time. 

- X d : The horizontal distance between hip joint and the support foot (Fixed coordinate 

system) at the end of double support phase time. 

- F. C. S: The fixed coordinate system which would be supposed on support foot in each step. 
-M.C : The mass centers of the links 

- Saggital plane: The plane that divides the body into right and left sections. 

- Frontal plane: The plane parallel to the long axis of the body and perpendicular to the 
saggital plane. 

The saggital and frontal planes of the human body are shown in figure (1.1) where the 
transverse plane schematic and definition have been neglected due to out of range of our 
calculation domain. 




agittal 



frontal 



Fig. (1.1). The body configuration with respect to various surfaces. 

The main role for the optimum trajectory path generation must be imagined upon hip and 
ankle joints of the robot. On the other hand, with creating smooth paths of the joints and 
with the aid of the breakpoints, the robot can move softly with its optimum movement 
parameters such as minimum actuator torques of joints (Shank) including integrity of the 
joints kinematic parameters. 
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The important parameters of the robot can be assumed as the listed above and are shown in 
figures (1.2) and (1.3). 

= kT+T 




D t D, 

Fig. (1.2). The robot important parameters for calculation of trajectory path of a seven link 
biped robot. 



t = kT 




t = (k + \)T c 




Fig. (1.3). The variables of hip: X gd X j . 

With respect to saggital investigation of the robot, the most affecting parameters of the 
mentioned joints can be summarized as below: 

1) Hip joint 

2) Ankle joint 

Obviously, the kinematic and dynamic attitude of shank joint will be under influence of the 
both of above mentioned joints. As can be seen from figure (1.2), the horizontal and vertical 
components of the joints play a great role in trajectory paths generation. This means that 
timing-process and location of the joints with respect to the fixed coordinate system which 
would be supposed on the support foot have considerable effects on the smooth paths and 
subsequently over stability of the robot. Regarding the above expressions and conditions, 
the vertical and horizontal components of the joints can be categorized and calculated as the 
following procedure. With respect to the conditions of the surfaces and figure (1.2) and (1.3), 
the components are classified as below: 



2.1) Foot angle on horizontal surface or stair 

'-*- ' = «". 



'«(<) = 



-9h 



t = kT c +T d 
t = (k + \)T c 
t = (k + l)T c +T, 



(1) 
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2.2) Foot angle on declined surfaces 



»(0 = 



(A + qJ t = kT c 
A-q b t — kT c + T d 

A + q f t = (k + \)T c 
A + q sf t = (k + \)T c +T d 



(2) 



2.3) The displacements of Horizontal and Vertical Foot Traveling Over Horizontal 
Surface or Stair 

With respect to figures (1.2) and (1.3), the horizontal and vertical components of ankle joint 
can be shown as below: 



,(0 = 



,(0 = 



At)- 





kD 


t = kT c 




kD s + l m sing,, H — 


t = kT c+ T d 




/ o/ (l-cos % ) 




)=< 


kD s +L m 


l = kT c+ T „, 




{k + 2)D s - l m smq f — 


t = (k+iyr e 




l ab (\-cosq f ) 






(k + 2)D, 


t = (k + l)T c +T d 


X+L 


t=kT c 


^H/Si%H„co% 


t=kT c +T d 


<H„ 


t=kT c +T m 


\e+L^f+L C0 ^f 


t=(k+\)T c 


\Man 


t=(k+l)T c +T d 


(k - \)h„ + /„„ 


t = kT c 


[k - l)h lt +l af smq b J i — 


t = kT c +T d 


l an COSq b 




k K + H „ 


t = kT c + T m 


k + X)h sl + l ah sin q f h — 


t = (k + l)T c 


l an cosq. 




{k-t 


!)/»„ + l m 


t = (k + \)T c + T d 



(3) 



(4) 



(5) 



2.4) The displacements of Horizontal and Vertical Foot Traveling Over declined Surface 

££>cos>t-/ J/1 sin/l t=kT c 

(kD s +l af )co$X+--- t=kT c +T d 

/ a „sin^ 6 -X) -l a/ cosq b -A) 
x ad Jt)=\(kq+LJcosl t=kT c +T m u* 



(kty +L ao )cosl 
((k+2)D s -lJcosl--- 
l all smq f +£)+l ab cos(lf +X) 
(k+2)D s cosi-^sin/t 



t={k+\)T c 



={k+i)T c +T d 
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(7) 



kD K sinA+l an cosA, t=kT c 

(kD s +l a/ )sirU+-- ■ t =k%+T d 

+/ O „cos^,, -X) +1. sin^ ;> -X) 
z ad Jf)^-(kD s +LJsiiti+H ao cosZ t=kT c +T ln 
((k+2)D s -lJsinA + --- t = (k + l)T c 

l {l „sinij / +X)+l ah cos^l2-(q t +X)) 
(k+2)D s sirU+l an cosi t=(k + l)T c +T d 

Assuming the above expressed breakpoints and also applying the following boundary 
condition of the robot during walking cycle, generation of the ankle joint trajectory path can 
be performed. The boundary conditions of the definite system are determined with respect 
to physical and geometrical specifications during movement of the system. As can be seen 
from figure (1.2) and (1.3), the linear and angular velocity of foot at the start and the end of 
double support phase equal to zero: 



'.(W,) 







+ T d ) = 



x a (kT c ) = 

*.((* + \)T c + T d ) = 



(8) 



i a {kT c ) = 

i„((* + \)T c + T d ) = 
The best method for generation of path trajectories refers to mathematical interpolation. There 
are several cases for obtaining the paths with respect to various conditions of the movement 
such as number of breakpoints and boundary conditions of the system. Regarding the 
mentioned conditions of a seven link biped robot, Spline and Vandermonde Matrix methods 
seem more suitable than the other cases of interpolation process. The Vandermonde case is the 
simplest method with respect to calculation process while it will include calculation errors 
with increment of breakpoint numbers. The stated defect will not emerge on Spline method 
and it will fit the optimum curve over the breakpoints regardless the number of points and 
boundary conditions. With respect to low number of domain breakpoints and boundary 
conditions of a seven link biped robot, there are no considerable differences in calculation 
process of Vandermonde and Spline methods. For an example, with choosing one of the stated 
methods and for relations (7) and (8), a sixth-order polynomial or third-order spline can be 
fitted for generation of the vertical movement of ankle joint. 



2.5) Hip Trajectory Interpolation for the Level Ground [Huang and et. Al, 2001] and 
Declined Surfaces 

From figures (1.2) and (1.3), the vertical and horizontal displacements of hip joint can be 
written as below: 



"h,Hor, 
stair 



kD, +x„ 



kT 



(k + \)D s - x sd t = kT c + T d 



(k + \)D s +x ed 



(k + l)T c 



(9) 
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(kD s +x c</ )cos/l t = kT, 

((k+\)D s -x sd )cosJi t = kT c +T d 
{{k + 1)7) + x ed ) cos/L t = {k+ 1)7, 



(10) 



H, 



t = kT, 

t = kT c +.5(T c -T d ) 

t = (k + \)T c 



(11) 



H h mm COS X 



,t kT c 
(kD s x ed )sinA 
7/ fimax cos^ -,t kT c .5(7; T d ) 

(kD s x sd ) sin A 
^;,mmCos/l •••, t (k 1)7; 
((k l)D s x ed )sinA, 



{k-\)h s +H h 
kh..+H Lm ,„ 



J = kT c 

t = kT+.5(T,-T d ) 
t = (k + 1)7; 



(12) 



(13) 



Where, in the above expressed relations, H and H indicate the minimum and 

maximum height of hip joint from the fixed coordinate system. Obviously and with respect 

to figure (1.2), the ankle and hip joint parameters including X a , Z and X h , Z h play main 

role in optimum generation of the trajectory paths of the robot. With utilization of relations 
(1)-(13) and using the mathematical interpolation process, the trajectory paths of the robot 
will be completed. 

nip a, swing a , sup nip 



Z — Z 
hip a, sup 




Fig. (1.4). The link's angles and configurations. 

Regarding figure (1.4) and the trajectory paths generation of the robot based on four important 
parameters of the system (J^,Z a and X h , Z h ), the first kinematic parameter of the robot can 
be obtained easily. On the other hand / with utilization of inverse kinematics / the link's angle of 
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the robot will be calculated with respect to the domain nonlinear mathematical equations. As 
can be seen from figure (1.4), the equations can be written as below: 

/, cos(/r - 0j ) + / 2 cos(;r - 2 ) = a 

l x sin(/r — #j) +/ 2 sin(;r -0 2 ) = b 

I 3 cos(# 3 ) + l 4 cos(<? 4 ) = e 

l 3 sin(6 , 3 ) + l A sin(# 4 ) = d 
Where, 



(14) 
(15) 



- hip 



' a , Sup 



d- 



- Y — Y 

hip a,s\ving 

— 7. — 7, 

hip clawing 



The all of conditions and the needed factors for solving of the relations (14) and (15) 
have been provided. The right hand of relations (14) and (15) are calculated from 
the interpolation process. For the stated purpose and with beginning to design 
program in MALAB environment and with utilization of strong commands such as 
fsolve, the angles of the links are calculated numerically. In follow and using 
kinematic chain of the robot links, the angular velocity and acceleration of the links 
and subsequently the linear velocities and accelerations are obtained. With respect 
to figure (1.5) and assuming the unit vectors parallel to the link's axis and then 
calculation of the link's position vectors relative to the assumed F.C.S, the following 
relations are obtained: 




Fig. (1.5). The assumed unit vectors to obtain the position vectors. 

'o = ',„.< ( C0S (/L< + 9fV + sinifln.c + 1f)K) 
r l = (/„ cos(/? w; +q / ) + l cl cos(0 l -A))I 
+ (I, sin(6> - X) + /„ sin(/?, o , + q f ))K 

r 2 = <7 cos(/?, o , + q f ) + /, cos(6> - X) 
- l cl cos(/r - 2 + X))I + (/, sin(6> - X) 
+ /„ sin(/?, o , + q f ) + l c2 sin(;r - 2 + Xj)K 



(16) 
(17) 

(18) 
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(19) 



F 3 = (/ COS(/? to , +q f ) + l l cos(<9, - /I) 
- 1 2 cos(;r - # 2 + A) - l cl cos(# 3 - A))I 
+ (/, sin(«9, - A) + / sin(/?, o , + ?/ ) 
+ l 2 sin(;r - <9 2 + A) - l ci sin(# 3 - A))K 

r 4 = (/„ cos(y?, o , +q f ) + 1, cos(6' 1 - A) 

- l 2 cos(;r - 9 2 + A) - / 3 cos(<9 3 - A) (20) 

- l c4 cos(6> 4 - /I))/ + (/, sin^, - /I) 

+ 1 sin(/? M/ + ^ / ) + l 2 sin(;r - <9 2 + A) 

- /, sin(6» 3 - A) - l c4 sin(<9 4 - A))K 

F 5 = (/ cos(/? to , + 47) + /, cos(6» - /I) 

- 1 2 cos(;r - # 2 + A) - 1 3 cos(<9 3 - A) 

- 1 4 cos(# 4 -A) - l 5mc cos(^/2 - A + J3 sms - q„))I ( 21 ) 

+ (/, sin(«9, - A) + 1 sin(/? M + q f ) 

+ 1 2 sin(;r - 6 2 + A) - / 3 sin(<9 3 - A) 

- 1 4 sin((9 4 - A) - l 5mc sin(;r /2-A + J3 sms - q h )))K 

r,„ r = (h cos(y? ( „, +q f ) + l l cos(6» - A) 

- 1 2 cos(;r -9 2 +A) + l lor cos(;r / 2 - 9, or - A))I ( 2 2) 

+ (/, sin(#, -A) + l sin(/? (o , + q f ) 

+ 1 2 sin(;r -6 2 +A) + l tor sin(;r / 2 - tm - A))K 

As can be seen from relations (16)-(22), the all of position vectors have been calculated with 
respect to F.C.S for inserting into ZMP formula. The ZMP concept will be discussed in the 
next sub-section. Now, with the aid of first and second differentiating of relation (16)-(22), 
the linear velocities and accelerations of the link's mass centers can be calculated within 
relations (23)-(29). 

n, =L.M-s™(fi mc + q f )I + cos(ft m c +q f )K) (23) 

v, = Ho®o sin (/L + if) - hA sin (#i - W ( 24 ) 

+ (/^ cos(<9, - A) + 1 cos(/?, o , + q f ))K 

v 2 = (-! w„ sin(/? to , + q f ) - l x S\ sin(6> - A) 

(25) 
- / c2 « 2 sin(;r - 2 + A))I + (/,«, cos(6' 1 -A) 

+ l (b cos(/? M , + q f ) - l c2 w 2 cos(;r - 6 2 + A))K 

v 3 = (-/ « sin(/? M , +q f )-l l a>, sin(6>, - A) 

- l 2 a> 2 sin(;r -6 2 + A) + / c3 « 3 sin(0 3 - A))I ,_*\ 

+ (/,«, cos(<9, -A) + l w cos(J3 tol +q f ) 

- l 2 a> 2 cos(«- -6 2 + A)- / r3 « 3 cos(<9 3 - A))K 
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(27) 



v 4 = (-/„«„ sin(/?,„, +q f )- l x a> x sin(0, - A) 

- l 2 a> 2 sin(;r - Q 2 + A) + / 3 <5 3 sin(# 3 - A) 

+ l c4 a> 4 sin(6' 4 - A))I + (/,«, cos(<9, - A) 

+ l 3 cos(/?, o , +q f ) -l 2 a> 2 cos(;r-<9 2 + A)- 

/ 3 <» 3 cos(#, -A)- l c4 3 4 cos(# 4 - A))K 

v 5 =(-l a\ > sm(fi lo ,+q f )-l l a\smP l -A) 

-/ 2 ffl, sin^r - 6> 2 + A) + / 3 tt, sm^> - A) 

+l 4 a> 4 sinP 4 -A)-l 5mc Gi > sin<p;/2-A+/3 smc -q h ))I (28) 

+(/,«} cosft -X)+/ ^ cof C+9/) 

-l 2 a 2 cos^:-0 2 +A)-l 3 a i cos@ 3 -A) 

-l 4 ® 4 cos{9 4 -A)+I 5mc at j cosfc/2-A+/3 smc -q b ))K 

/ 2 ^sin^-<9 2 +A)+/ t > tor sin^/2-6? ,-A)X ( 2 9) 

+(q/ 1 cosfl -/L)+/ q ) cos^„,+^ / ) 

-/ 2 4 cos^r-<9 2 +A) -l lor a) lor cos$r/2-0 lor -A))K 

Accordingly, the linear acceleration of the links can be calculated easily. After generation 
of the robot trajectory paths with the aid of interpolation process and with utilization of 
MATLAB commands, the simulation of the biped robot can be performed. Based on the 
all above expressed relations and the resulted parameters and subsequently with 
inserting the parameters into the program, the simulation of the robot are presented in 
simulation results. 

3. Dynamic of the robot 

In similarity of human and the biped robots, the most important parameter of stability of 
the robot refers to ZMP. The ZMP (Zero moment point) is a point on the ground whose sum 
of all moments around this point is equal to zero. Totally, the ZMP mathematical 
formulation can be presented as below: 



'y\m j (gco&A + z i )x j — y /B,(gSfflA+jc,.)z, — /'Ifi, 

i=l 1=1 Nl 



(30) 



Where, x and '£. are horizontal and vertical acceleration of the link's mass center with 

respect to F.C.S where Q is the angular acceleration of the links calculated from the 

interpolation process. On the other hand, the stability of the robot is determined according 
to attitude of ZMP. This means that if the ZMP be within the convex hull of the robot, the 
stable movement of the robot will be obtained and there are no interruptions in kinematic 
parameters (Velocity of the links). The convex hull can be imagined as a projection of a 
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pyramid with its heads on support and swing foots and also on the hip joint. Generally, the 
ZMP can be classified as the following cases: 

1) Moving ZMP 

2) Fixed ZMP 

The moving type of the robot walking is similar to human gait. In the fixed type, the 
ZMP position is restricted through the support feet or the user's selected areas. 
Consequently, the significant torso's modified motion is required for stable walking of 
the robot. For the explained process, the program has been designed to find target angle 
of the torso for providing the fixed ZMP position automatically. In the designed 

program, (J torso shows the deflection angle of the torso determined by the user or 

calculated by auto detector mood of the program. Note, in the mood of auto detector, 
the torso needed motion for obtaining the mentioned fixed ZMP will be extracted with 
respect to the desired ranges. The desired ranges include the defined support feet area 
by the users or automatically by the designed program. Note, the most affecting 
parameters for obtaining the robot's stable walking are the hip's height and position. By 

varying the parameters with iterative method for X j,X d [Huang and et. Al, 2001] and 

choosing the optimum hip height, the robot control process with respect to the torso's 

modified angles and the mentioned parameters can be performed. To obtain the joint's 

actuator torques, the Lagrangian relation [Kraige, 1989] has been used at the single 

support phase as below: 

(31) 
T i = H(q)q + C(q, q)q + G{q, ) 

where, z = 0, 2, • • • 6 and H , C, G are mass inertia, coriolis and gravitational matrices of the 
system which can be written as following: 



H{q)-- 



K K Ki K hi KK, 

K K K K ks Kkn 

'hi 'hi 'hi, ^4 ^55 'he'hi 

h h h h h h h 

"41 "42 "43 "44 "45 "46 "47 

hi hi hi h 4 hs hehi 

hi hi hi h-,4 hs hAi. 



QsS= 



q, c\ 2 q 3 c l4 c l5 q 6 ^ 7 

^21 "X2 ^3 ^4 ^5 i.b ^7 

°1\ ^32 ^3 ^34 ^35 °ib ^37 

c 41 c 42 c 43 c 44 c 45 c 46 c 41 

C S I C 51 C 53 C 5A C 55 C 56 C 57 

?b 1 C 62 ^"63 C 64 ^"65 ^66 ^67 



G{q) 



Obviously, the above expressed matrices show the double support phase of the movement 
of the robot where they are used for the single support phase of the movement. On the other 
hand, the relation (31) is used for the single support phase of the robot. Within the double 
support phase of the robot, due to the occurrence impact between the swing leg and the 
ground, the modified shape of relation (31) is used with respect to effects of the reaction 
forces of the ground [Lum and et. Al. 1999 and Westervelt, 2003, and Hon and et. AL, 1978]. 
For the explained process and in order to obtain the single support phase equations of the 
robot, the value of q (as can be seen in figure (1.4)) must be put equal to zero. The 

calculation process of the above mentioned matrices components contain bulk mathematical 
relations. Here, for avoiding the aforesaid relations, just the simplified relations are 
presented: 
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2lJ 3 cos<g 3 -ft )-2/,/ c4 coslft -q 4 )-2l 2 l 3 cos(q 3 -q 2 )-2l 2 l c4 cos<g 2 - ft, ) + 2/ c4 / 3 coslft -ft))]+ 

[/Mj (/ 2 + / 2 + / 2 + 1 4 + / 2 /sw ,„ g + 2/,/ 2 COS(ft - ft ) - 2/,/ 3 COS& 3 - ft ) -2/,/ 4 COS& - ft ) — • • 

yjcfwingCos^ -(tvI2)+ q /swmg - p fswmg ) - 2/ 2 / 3 cos(ft - ft ) - 2/ 2 / 4 cos& 4 - ft ) - • • 

VJcMmgWsfa -(tv 1 2) + q fswing - p fswin ) + 2/ 3 / 4 coslft, - ft ) + • • • 

VJcMmgWsfa -(tv 1 2) + q fswmg - p fswin ) + 2l 3 l cfswmg cos(g 3 -(tv 1 2) + q fswmg - (3 fswmg ))]+ ■ ■ ■ 

HorQl + l i + l L,o+ 2 h l 2Cos(g 2 -ft)+2/ 1 / ctorao cosfft orao -ft -(tv/2))+--- 

2/ 2 ^ , !0 cos^ ,, so + ft +(tvI2))]+I x +I 2 +I 3 +I 4 +I 5 +I lorso 
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h n =K( / C 2 2 +Vc2 cos <0 , 2 "^iXH+Kft 2 + l a +/i/ 2 cos(ff 2 -qj-ljacosfa -q 3 )--- 

2/ 2 / c3 cos(g 2 -<7 3 ))]+[ot 4 (/ 2 2 +/ 3 2 +/ c 2 4 +/,/ 2 cos^ 2 -? 1 )-/ 1 / 3 cos^ 3 -^) 

l l l c4 cos<g l -q 4 )-2l 2 l 3 cos<g 3 -q 2 )-2l 2 l c4 cos<g 2 -q 4 ) + 2l c4 l 3 cos(q 3 -? 4 ))]+-- 

K (4 2 + /2 + / 4 2 + /2 / 5 „™ g + V2 cos <02 " ft ) " V 3 cos <ft " ?i ) " hh cos fe " ft ) - • • ' 
hhfamn g co ^i -(71 1 2) + q Mwg - P jsmng ) - 2/ 2 / 3 cosfe - <? 2 ) - 2/ 2 / 4 cos^ 4 - q 2 ) - • • • 

^c/^cosfe - (ff / 2) + <?/,,,,„£ " A/,, w „ g ) + 2/3/4 cos fe " <7 3 ) + ' • • 

2/ 4 /c/.™,^cos(^ 4 -(71 1 2) + q fswwg - P jsmng ) + 2l 3 l cfswing cos(q 3 -(nl2) + q fswmg - P fswing ))]+ ■ ■ ■ 

HoMl +/L™ + /i/2COsfe -q l ) + hl clorso cos< c q torso -q l -(7rl2)) + 2l 2 l ctorso cos(g lorso + q 2 +(x/2))] 
+I 2 +h+h+h+I torso 



h 14 =K(/rf ~hh} C0S ^7l ~? 3 )-y c 3 C0S fe -ft))]+K(/3 2 + /2 4 -/1/3 C0S fe -?l)" 



'1V4 

,2 



cos^, -q 4 )-l 2 l 3 cos<q 3 -q 2 )-l 2 l c4 cos<q 2 -q 4 ) + 2lJ 3 cos<q 3 -q 4 ))]+[m 5 (l 3 +/ 4 +• 



/c/ sw ,„ g - hhcos(3 3 -q l )-l ] l 4 costq 4 -q l )- hl cfsyvin fO^ -(kI2) + q fswing - P /swmg ) — ■■ 

hh cos ^3 -q 2 )-hh cosfe -^l-y^cos^ -(^l2) + q f3Vting -P fswin ) + 2l 3 h cosiq 4 -q 3 ) + - 

2lJc M i„ g cos^ 4 -(rr/2) + q M . ing -p fswi J + 2l 3 l cfswing cos<q 3 -(nll) + q fsviin - P fswlng ))]+I 3 +h+h 



h ls = K (/ c 2 4 - /j/ c4 cosfe, - ? 4 ) - y c4 cosfe 2 - # 4 ) + lj } cos(q 3 - q 4 ))] + [m 5 (l\ +■■■ 
Ic/swing ~ hh cosfe 4 - g,) - Z^^ cosfe, - (;z72) + ? /m ,„ g - ^ /m , Bg ) - / 2 / 4 cosfe 4 - ft) — ■ 
Vgh** cosfe 2 - (» / 2) + q fswing - P fming ) + 1,1 4 cos(q 4 - q 3 ) + ■ ■ ■ 
2hh fswi „ g cos(g 4 - Or/2) + q Jiwing - P fsKing ) + lj cfswing cos(g, -(xl2) + q fswing - P fswhlg ))] + 1 4 + I s 



K = \Ph Qcfimng ~ hhfi W ing C0S (ai ~ &I2) + q fnvi „ g ~ P fswing ) " / 2 / c/sw ;„ g COSfe - (K 1 2) + q fawing - P fiwjng ) 

+ hh f ^n g ^s(q 4 -(xl2) + q fswlng - p fswing ) + hh fswing cos(g 3 -(xl2) + q fswing - P fiwing ))]+I 5 



K =\.m, orso (l 2 ctorso +kl ctorso zos(-q tmo -(x/2)-q l ) + l 2 l ctorso cos(q lorso +(x/2) + q 2 ))] + I K 
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h 21 = K(£ 2 + l Je C0 <<l2 -<P)+ l Je2 COS (l2 ~ ft ))] + [«3 (^ + l c3 + hh C0 ^2 ~ <P) — ' 

lj e cos(-q 3 +(p) + lj 2 cos(q 2 - ft ) - hhi cos(ft - ft ) - 2l 2 l c3 cos(ft - ft ))] + [m 4 {l\ + 1\ + l] 4 + ■ ■ ■ 

l 2 l e cos(g 2 -(f))- l 3 l e cos(ft —<p) — l c4 l e cos(-ft, +<p) + /,/ 2 cos(ft - ft ) - lj 3 cos(ft - ft ) 

/,/ c4 cos(ft -ft)-2/ 2 / 3 cos(ft -ft)-2/ 2 / c4 cos(ft -q 4 ) + 2lJ 3 cos(q 3 -ft,))]+ — 
[m 5 (l 2 + 1 3 + 1 4 + l^ wing + l 2 l e cos(q 2 -<p)- l 3 l e cos(ft -<p)- l 4 l e cos(ft -<p) — - 

hfi«mje C0S # " (^ / 2 ) + <1 fining ~ P fining) + hh COS(ft " ft ) " 1J 3 COS(ft - ft ) — • • 

/,/ 4 cos(ft -q l )-lJ cfswing cos(q 1 -(x/2) + q fswing -/3 fswing )-2l 2 l 3 cos(g 3 -ft) — •• 
2/ 2 / 4 cos(ft, - ft ) - 2l 2 l cfswing cos(ft - 072) + q fswing - P fswing ) + 2l 3 l 4 cos(q 4 - ft ) + • • • 

2 hhfi K ing C0S fe 4 - (*/ 2 ) + fti^g " £**») + ^v^g «*& ~ (*/ 2) + ft^ - £ /sm „ g ))] + • • • 
K OT (4 2 + Co™ + kh cos (?2 ~ ?>) + y^™ COS(-ft„ rao -<p-{Ml2))+ lj 2 COS(ft - ft ) + • • • 

^,„,cos(-ft „ -ft -(^/2)) + 2/ 2 / cM „ o cos(ft ors „ + ft +(^/2))]+/ 2 +I 3 +I 4 +I 5 +I torm 



h 22 =[wj(4 2 +/ 1 /, 2 cos^ r 2 -4))]+[?«;(/ 2 +4 3 +// 2 cos^ 2 -ft)— /[fccosfo -ft) — 

2/ 2 / c3 cos^ 2 -%))]+K(/ 2 +/ 2 +/ 2 4 +// 2 cosfr 2 -^-/j/jCOS^j -ft) -/,(.„ cos^ -ft) — • • 

2/ 2 / 3 cosfe -q 2 )-2! 2 l c4 cos<q 2 -q 4 )+2lJ 3 co%<q 3 -ft))]+[m,(/ 2 +/ 2 +/ 4 +4„,-„ g +- • ■ 

/,/ 2 cosfe -9,)-/,/ 3 cosfe -ft)-# 4 cosfe -ft)-/,/ c/ra ,,„ g cos& -(«" / 2)+ft„„„ g -^ s „,„ i )-2/ 2 / 3 cosfe -ft)- 

-2I 2 l 4 cos<q 4 -q 2 )-2l 2 l cfsw!iig cos<q 2 -(x/2)+q fswt „-fi fswin )+2l } l 4 costo 4 -ft)+- • ■ 

21l mn fos<q 4 -(a/2) +q fmtng -p mn ) +21 3 l cfswin fos^ 3 -(a/2)+q fmlng -p fswln ))-]+- 

\PhorQl + l L.so+h I 2Coslq 2 -q l )+lLo rsi POS^q urso -q 1 -(a/2))+2l 2 l ctorsi pos<q !orso +q 2 +(^/2))]+- • • 

I 2 +I 3 +I 4 +I 5 +I torso 



h 23 =[W2(4)]+K(4 2 +4 2 3 -2/2^3 cos^r 2 -ft))> K(/ 2 +/ 3 +/ ( 2 4 -2/ 2 / 3 cosfe -ft)- • • 

2/ 2 / c4 cosft, -q 4 )+21J 3 cosfe -ft))}f K(/ 2 +/ 3 2 +/ 4 +l^, wing -2l 2 l 3 cosfe -ft)-2/ 2 / 4 COsfe -ft)- 

2/ 2 ^™,„gCos^ 2 -(nl2)+q fswing -p fswing )+2l 3 l 4 cosfe -q 3 )+21J c/swing cos^ 4 -(x/2)+q fsV!ing -P fswing )+- 

^W * -(^/2)+ft, !W ,. K4 -^ /iw ,, g ))]+KX/ 2 +/ 2 „ +2/ 2 / CM „,cos^„ +ft +(»/2))>- • • 

I 2 +I 3 +I 4 +I 5 +I, orso 
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h 24 =Dm,(/ 2 3 -l 2 l ci cosq 2 -q 3 ))\r[m 4 (!l +l 2 c4 -l 2 l 3 cosq 3 -q 2 )-l 2 l c4 cosq 2 -q 4 )+llj 3 cosq 3 -q 4 ))}*- • • 

2/ 3 / 4 cosfr, -q 3 )+2l 4 l cfswing cosq 4 -(n/2)+q fswin -P fswin )+2J 3 l cfswing cosq 3 -{xl2)+q fswirlg -p fswirl ))Y- ■ ■ 
I 3 +I 4 +I 5 



h 25 =K(/' 4 -/ 2 / c4 cos^ 2 -q 4 )+lj 3 cosq 3 -g 4 ))]+K(/ 4 +ll fswm -l 2 l A cosq 4 -q 2 ) — - ■ 
hhfimnf os ^2 -(x/2)+q M „„ g -/3 fsw ,„)+hh cosq 4 -q 3 )+2J 4 l cfswiiig cosq 4 -{nll)+q fswing -p fswi „)+- 
hlfivinfosfa -(x/2)+q fsw ,„-/3 /swm ))]+I 4 +I 5 



\6< m Sl, smng -hhf sv ., n f^2-(xl2) + q fswm - 



h 27 = [m lor JlL,, + hhomo cos(? torao + (*/ 2) + q 2 )} + 7 ft 



h 3i =K( / c3 ~lJ e cos^q 3 +0-/,/ e3 cos^ 1 -q 3 )-l 2 l c3 cos<q 2 -q 3 ))]+[m 4 (l 2 3 +/ c 2 4 — • 

l 3 l e cos(q 3 -(p)-lJ e cos{-q 4 + (p)-l l l 3 cos(g 3 -q [ )-l l l c4 cos(g l -q 4 )-l 2 l 3 cos(g 3 -q 2 ) 

l 2 l c4 cos<g 2 -q 4 ) + 2lJ 3 cos<g 3 -q A ))]+[m s Q 3 +/ 4 +l 2 cfswing -l 3 l e cos<q 3 -<p) — -- 
hh cosfa-p)- l cfswil J e cos^ - (*/ 2) + ?JJirtw - ^ /!u ,„ g ) - /,/ 3 cosfe - ?I ) - • • 
/,/ 4 cos<g 4 - q x ) - l x l cfswin fos<q x -(xl2) + q fswmg - p fswmg ) -l 2 l 3 cos<g 3 -q 2 )--- 
l 2 l 4 cos(q 4 -q 2 )- l 2 l cfswmg cos(q 2 - {nil) + q fsviing - p fswm ) + 2l 3 l 4 cos(q 4 -q 3 ) + --- 
2lJ cfs „ ing cos<q 4 -(x/2) + q fswing -/1 fswing )+2l 3 l cfswing cos<q 3 -(nl2) + q fswmg -/3 fswmg ))]+I 3 +I 4 +I 5 
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h 32 =K( / , 2 3 -^3 C0S <<7i -q 3 )-hh3 C0S <G2 -ft^l+Kft 2 H 2 4 -V 3 C0S fe -?i)— •• 



Tc4 



cos^ -q 4 )-l 2 l 3 cos(g 3 -q 2 )-l 2 l c4 cos(g 2 -q 4 ) + 2lJ 3 cos(g 3 -q 4 ))]+[m 5 (l 3 + / 4 +•• 



lcMin g -hh cos fe " ?i ) -V 4 c °s&, - ft) - hh/s^gcos^ - Or/2) + g^ - ^,„ g ) - • • 

/ 2 / 3 cosfe - ^ 2 ) - / 2 / 4 cos& 4 - ?2 ) - l 2 l cfiwmg cos<q 2 -(itl2) + q /swmg - p fswin ) + 2/ 3 / 4 cos^ 4 - ^ ) + • • • 

2l JcMi ng c°s<Gl4 ~{7i 1 2) + q fswing - p fswing ) + 21 3 l cfswlng cos<q 3 -(itl2) + q fswmg - p fsw ,J)]+ ■ ■ ■ 



h +h +h 



h 33 =K(4 ~Vc3 C0S <?2 "ft))}^K( / 3 2 +^4 -% C0S ^3 -<h)~Ud>. C0S fe ^d+^Ji COSfc ~? 4 ))]f • • 

K(4 2 +^ +4»™ g - / 2 / 3 cos & -? 2 )- / 2 / 4 cos^ 4 -q 2 )-l 2 l cfswin fosq 2 -(x/2)+q M . m -/3 Min )+- ■ ■ 
21,1, cosq 4 -q 3 )+21J cfswing cosq 4 -(it/2)+q fiwing -p fiwin )+2l 3 l cfswing zosq 3 -(it/2)+q fiwlng -p /swin ))}h- ■ 
+I 3 +I 4 +I 5 



K =K(/ 2 3 )] + Kft 2 + C +2lJ 3 COs(q 3 -? 4 ))]+K(/ 3 2 +/ 4 2 +l' /smng + 2l 3 l 4 cos(q 4 -q 3 ) + - 
2 hhfiu. lng COS(q 4 -(it 1 2) + q fswmg - p fswing ) + 2l 3 l cjswmg COS(q 3 -(it 12) + q fswmg - P fswmg ))] + • • ■ 

+ / 3 +/ 4 +/ 5 



h 35 =K(/ c4 +lJ 3 cos(q 3 -q 4 ))]+[m s (l 4 + l cfswlng + l 3 l 4 COs(q 4 -q 3 ) + — 

V J effing cos fe - (fil 2 ) + 9/,„,„ g " Pjm*) + V./.™™* cosfe 3 - (it 1 2) + ?Jiwtw - p fswmg ))] + I 4 +I 5 



^K^CW - ^/™,^ 



h 37 
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K =K(4 2 4 ~Ue cos( ^4 +<P)-hL cosfr -q 4 )-l 2 l c4 cosfe -q 4 )+lj 3 cosfe -<?<,))]+•• • 

hl C fi„inf 0S (ii -(^ /2 ) +<7/™™ g -/W ~hh cos^ 4 -q 2 )-i 2 i cfsw ,„fosq 2 -W2)+q fsw ,„ g -/3 fswi „)+- ■ ■ 

hh cos^ 4 -q 3 )+%JcMinf°sq 4 -(x/2)+q fswlng -j3 fswin )+l 3 l cfm , ln fosq 3 -(nl2)+q fswing -p fsyvin ))}±-- ■ 



h 42 =K(C -A/ C 4 cosfr -q 4 )-l 2 l c4 cosq 2 -q 4 )+lj 3 cosq 3 -<? 4 ))]+K(/ 4 +/^ wi „ g - • • 

/^cosfr, -qj-hhfimnfwvi -(x/2)+q fsw ,„ g -p /swin )-i 2 i 4 cosq 4 -&)— • 

hK fsW ir,fOsq 2 -{Jtl2)+q fswing -P fswin )+l 3 h ™sq 4 ~%)+2l 4 l cfiwin fO&q 4 -{nl2)+q fswin -P fswin )+- 

hh/^inf ^ -(x/2)+q fsw ,„ g -/3 fswi „))]+i 4 +I S 



h 43 =K(/ c 2 4 -l 2 l cA cosq 2 -<? 4 ) +/ c4 / 3 cos(/ 3 -<? 4 ))]+K(/ 4 +l 2 cfsv ,, mg -l 2 l 4 cosq 4 -<? 2 ) - • • 
hhf swin fosq 3 -{nl2)+q fswing - P fswin ))\+I 4 +h 



h 44 =K(/ C 2 4 +/ c4 / 3 cos(<? 3 -^ 4 ))] + K(/ 4 +lc /swmg -l 2 l 4 cos(q 4 -q 2 ) + l 3 l 4 cos(q 4 -q 3 ) + — 
2l 4 l cJsw ,„ g cos(q 4 - {re 1 2) + ?J ^ - /? /swi „ g ) + l 3 l cfswing COS(q 3 -(71 1 2) + q fswmg - p fswmg ))] +I 4 +I 5 



h 45 = [m 4 (/ 2 4 )] + [m 5 (/ 2 + l) fswing + 2lJ cfiwing cos(q 4 - (n 1 2) + q fsmng - j5 fsmng ))] + / 4 + / 5 



h 46 = K (lcfs»,ng + V^S C0S( ^4 ~ O / 2 ) + tffrrt* " P fsmng ))] + 7 5 



h 47 =0 
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h<, =\mJl 2 , . —If I cos(p-(ft/2)+q, . — B, . )-/,/, cosfo, —(nl7)+q f — B, ) 

51 L 5V c/swing cfswmge * V*' / "/swing r /.swings 1 cfswmg vl V / ^ J swing r /swing' 

IJc/swing 00 ^! -(n/2)+q /swing -fi Min )+lJ c/swing cosfa -ijil2)+q fswm -/3 fswm )+-- ■ 

hLfswingZOSfa-inl^ + qfswmg-PfswinWh 



K=[ m Slfsw,ng-hh/sw,ng™^<nl2)+q fswl ^ 
hhfswingC°^4-(x/2) + q fswm -/3 fsw J + ^^^ 



h„=\mJl 2 f . —LI, ■ cosiq-, -(7i/2) + q f . — B, )-\ — 

53 L 5V c/swing 2 c/swmg ^12 V / u /swing r /swing / 

/,/, cosiq A — (?t 1 2) + q , — B t ■ ) + Ll f cos(a, -(nl2) + q, . —B, ■ ))l + /< 

4 c/swing \™ 4 V / ^ /swing r Jswing / 3 c/swwg V2 3 V / "/swing r /swing J ' 1 5 



h 54 =K(4wmg + V c/ ™™ J FOS^ 4 -W2)+q J ^-Pj^+hl<^o^<nl2)+q J ^-P J ^)b-I 5 



h 55 =K( / « +/ 4 / c/™,« 8 C0S ^4 -(«-/2) + ? /TO/ ^ -^„ g ))] + / 5 



h 56 =K(/^„ g )] + / 5 



h 57 =0 



/? 6] = [masstorso(lf or + lJ tor cos(-q turso -(nlT)-(p) + lj l0r cos(-q torw -(x/2)-q l ) + - 
hhor cos(q torso +{xl2) + q 2 ))] + I torso 
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h 62 = [masstorsdJl, + lj lm . cos(-q lorso -(jr/2)-q l ) + l 2 l lor cos(q lorso + {7tl2) + q 2 ))] + I torso 



h a = [masstorso(lf or + l 2 l tor cos(g torso +(x/2) + q 2 ))] + 1 u 



K =[ m asstorso(ll r )] + I torso ,h M ^ 66 = 



In the all of above mentioned components, the following parameters have been used and 
they can be seen in figure (1.6) : 



(32) 




Fig. (1.6). (a) The foot's and (b) The support link's geometrical configurations. 
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Where, / f swin „ and P t swin „ refer to indexes of the swing leg with respect to geometrical 

configurations of the mass center of the swing leg as can be deducted from figure (1.6). 
The coriolis and gravitational components of the relation (30) can be calculated easily. 
After calculation of kinematic and dynamic parameters of the robot, the control process 
of the system will be emerged. In summery, the adaptive control procedure has been 
used for a known seven link biped robot. The more details and the related definitions 
such as the known and unknown system with respect to control aspect can be found in 
reference [Musavi and Bagheri, 2007 and Musavi, 2006, and Bagheri and Felezi, and et. 
AL, 2006]. For the simulation of the robot, the obtained parameters and relations are 
inserted into the designed program in Matlab environment. As can be seen in 
simulation results section, the most concerns refer to stability of the robot with respect 
to the important affecting parameters of the robot movements which indicate the ankle 
and hip joints parameters [Bagheri and Najafi and et. Al. 2006], As can be seen from the 
simulations figures, the hip height and horizontal positions have considerable effects 
over the position of the ZMP and subsequently over the stability of the robot. To 
minimize the driver actuator torques of the joints, especially for the knee joint of the 
robot, the hip height which measured from the F.C.S has drastic role for diminution of 
the torques. 

4. Simulation Results 

In the designed program, the mentioned simulation processes for the two types of 
ZMP have been used for both of the nominal and un-nominal gait. For the un-nominal 
walking of the robot, the hip parameters (hip height) have been changed to consider 
the effects of the un-nominal motion upon the joint's actuator torques. The results are 
presented in figures (1.8) to (1.15) while the robot walks over declined surfaces for the 
single phase of the walking. Figure (1.15) shows combined path of the robot. The used 
specifications of the simulation of the robot are listed in table No. 1. Figures (1.8), 
(1.10) and (1.12) display the moving type of ZMP with the nominal walking of the 
robot. Figures (1.9), (1.11) and (1.13) show the same type of ZMP and also the un- 
nominal walking of the robot (with the changed hip height form the fixed coordinate 
system). Figure (1.14) shows the fixed ZMP upon descending surface. As can been seen 
from the table, the swing and support legs have the same geometrical and inertial 
values whereas in the designed program the users can choose different specifications. 
Note, the swing leg impact and the ground has been regarded in the designed program 
as given in references [Lum and et. Al. 1999 and Westervelt, 2003, and Hon and et. AL, 
1978]. Below, the saggital movement and stability analysis of the seven link biped 
robot has been considered whereas the frontal considerations are neglected. For 

convenience, 3D simulations of the biped robot are presented. In table No. 1, l an ,l a i, 
and I a f present the foot profile which are displayed in figure (1.7). The program 

enables the user to compare the results as presented in figures where the paths for the 
single phase walking of the robot have been concerned. In the program with the aid of 
the given break points, either third-order spline or Vandermonde Matrix has been 
used for providing the different trajectory paths. With the aid of the designed 
program, the kinematic, dynamic and control parameters have been evaluated. Also, 
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the two types of ZMP have been investigated. The presented simulations indicate the 
hip height effects over joint's actuator torques for minimizing energy consumption and 
especially obtaining fine stability margin. As can be seen in figures (9.h), (11. h) and 
(13. h), for the un-nominal walking of the robot with the lower hip height, the knee's 
actuator torque values is more than the obtained values as shown in figures (8.h), 
(10. h) and (12. h) (for the nominal gait with the higher hip height). This is due to the 
robot's need to bend its knee joint more at a low hip position. Therefore, the large knee 
joint torque is required to support the robot. Therefore, for reducing the load on the 
knee joint and consequently with respect to minimum energy consumption, it is 
essential to keep the hip at a high position. Finally, the trajectory path generation 
needs more precision with respect to the obtained kinematic relations to avoid the 
link's velocity discontinuities. The presented results have an acceptable consistency 
with the typical robot. 



'Sh. 
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ho. 
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an 


hb 


If 


0.3m 


0.3m 


0.3m 


0.1m 


0.1m 


0.13m 


™ sh . 


m ik. 


m T„. 


m F„. 


D s 


T c 


5.1kg 


10kg 


43kg 


33kg 


0.5m 


0.9s 


T d 


T m 


H ao 


Ko 


*ed 


x sd 


0.1 8.v 


0.4s 


0.16m 


0.4m 


0.23m 


0.23m 


Sgs 


Sgf 


H mm 


H m!a 


K 


H s 








0.60m 


0.62m 


0.1m 


0.15m 


shank 


tight 


torso 


foot 


0.02kgm 2 


0.08/tgm 2 


1 Akgnf 


O.OXkgm 1 



Table 1. The simulated robot specifications. 
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Fig. 1.7. The foot configuration. 
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(a) Stick Diagram 
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(c) Velocity 



(d) Acceleration 
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(/') Inertial Forces (h) Driver Torques 

Fig. 1.8. (a) The robot's stick diagram on A = 0" , Moving ZMP, H mia 0.60m, H mtx 0.62m 

(b) The moving ZMP diagram in nominal gait which satisfies stability criteria (c) : Shank M.C 

velocity, —: Tight M.C velocity (d) : Shank M.C acceleration, —-.Tight M.C acceleration (e) : 

Shank angular velocity, —: Tight angular velocity (ft : Shank angular acceleration, —: Tight 

angular acceleration (ft ; Shank M.C inertial force, —: Tight M.C inertial force (h) : Ankle joint 

torque, —: Hip joint torque, ...: Shank joint torque 
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Fig. 1.9. fa) The robot's stick diagram on A 0° , Moving ZMP, H . =0.50m,H =Q.52m 

(b) The moving ZMP diagram in nominal gait which satisfies stability criteria (c) : Shank M.C 

velocity, —: Tight M.C velocity (d) : Shank M.C acceleration, —-.Tight M.C acceleration (e) : 

Shank angular velocity, —: Tight angular velocity (f) : Shank angular acceleration, —: Tight 

angular acceleration (j) : Shank M.C inertial force, —: Tight M.C inertial force (h) : Ankle joint 

torque, —: Hip joint torque, ...: Shank joint torque 
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Fig. 1.10 (a) The robot's stick diagram on X = 10° , Moving ZMP, H =0.60m,H = 0.62m 

(b) The moving ZMP diagram in nominal gait which satisfies stability criteria (c) ; Shank M.C 

velocity, —: Tight M.C velocity (d) : Shank M.C acceleration, —-.Tight M.C acceleration (e) : 

Shank angular velocity, —: Tight angular velocity (f) : Shank angular acceleration, —: Tight 

angular acceleration (j) : Shank M.C inertial force, —: Tight M.C inertial force (h) : Ankle joint 

torque, —: Hip joint torque, ...: Shank joint torque 
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Fig. 1.11. 

(a) The robot's stick diagram on A = 10° , Moving ZMP, H . 0.50m,// 

(b) The moving ZMP diagram in nominal gait which satisfies stability criteria 
(c) : Shank M.C velocity, —: Tight M.C velocity 

(d) : Shank M.C acceleration, —-.Tight M.C acceleration 

(e) : Shank angular velocity, —: Tight angular velocity 

(f) : Shank angular acceleration, —: Tight angular acceleration 

(j) : Shank M.C inertial force, —: Tight M.C inertial force 

(h) : Ankle joint torque, —: Hip joint torque, ...: Shank joint torque 
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Fig. 1.12. 

(a) The robot's stick diagram on X = -8° , Moving ZMP, H = 0.60m, H = 0.62m 

(b) The moving ZMP diagram in nominal gait which satisfies stability criteria 
(c) : Shank M.C velocity, —: Tight M.C velocity 

(d) : Shank M.C acceleration, —-.Tight M.C acceleration 

(e) : Shank angular velocity, —: Tight angular velocity 

(f) : Shank angular acceleration, —: Tight angular acceleration 

(j) : Shank M.C inertial force, —: Tight M.C inertial force 

(h) : Ankle joint torque, —: Hip joint torque, ...: Shank joint torque 
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Fig. 1.13 

(a) The robot's stick diagram on X = -8° , Moving ZMP, H . = 0.50m, H = 0.52m 

(b) The moving ZMP diagram in nominal gait which satisfies stability criteria 
(c) : Shank M.C velocity, —: Tight M.C velocity 

(d) : Shank M.C acceleration, —-.Tight M.C acceleration 

(e) : Shank angular velocity, —: Tight angular velocity 

(f) : Shank angular acceleration, —: Tight angular acceleration 

(j) : Shank M.C inertial force, —: Tight M.C inertial force 

(h) : Ankle joint torque, —: Hip joint torque, ...: Shank joint torque 
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(a) Stick Diagram 
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Fig. 1.14 

(a) T/ie robot's stick diagram on A = -8° , Fixed ZMP, H . = 0.60m, H 

(b) The fixed ZMP diagram in nominal gait which satisfies stability criteria 
(c) : Shank M.C velocity, —: Tight M.C velocity 

(d) : Shank M.C acceleration, —-.Tight M.C acceleration 

(e) : Shank angular velocity, —: Tight angular velocity 

(f) : Shank angular acceleration, —: Tight angular acceleration 

(j) : Shank M.C inertial force, —: Tight M.C inertial force 

(li) : Ankle joint torque, —: Hip joint torque, ...: Shank joint torque 
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(a) Stick Diagram 



(b) ZMP 



(c) Inertial Forces 
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(d) Driver Torques (e) Driver Torques 

Fig. 1.15 (a) The robot's stick diagram on combined surface, nominal motion, Moving ZMP, 
X = -8 ° (b) The moving ZMP diagram in nominal gait which satisfies stability criteria (c) Inertial 

forces: : Supp. tight, —: Supp. shank, ...: Swing tight, .-: Swing shank (d) Joint's torques: : Swing 

shank joint, — : Swing ankle joint, . ..: Supp. hip joint, .-: Swing hip joint (e) Joint's torques: : Supp. 

ankle joint, —: Supp. shank joint 
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1. Introduction 

Many studies have been made to develop walking anthropomorphic robots able to move in 
environments well-adapted to human beings and able to cooperate with them. Among the 
more advanced projects of humanoid robots, one can mention : the Honda robots P2, P3 
(Hirai, 1997) (Hirai et al., 1998) and Asimo (Sakagami et al., 2002), the HRP series developed 
by AIST (Kaneko et al., 1998) (Kajita et al., 2005) (Kaneko et al., 2004) (Morisawa et al., 2005), 
the small robot QRIO proposed by Sony (Nagasaka et al., 2004), the KHR series developed 
by KAIST (Kim et al., 2004) (Kim et al, 2005), the last robot of Waseda University having 
seven degrees of freedom per leg (Ogura et al., 2004), Johnnie (Lohmeier et al., 2004) and H7 
(Kagami et al., 2005). These robots are namely able to climb stairs and to carry their own 
power supply during stable walking. The problem of dynamic locomotion and gait 
generation for biped robot has been studied theoretically and experimentally with quite 
different approaches. However, when searchers study the behavior or the design of 
dynamic walking robots, they inevitably meet a number of intrinsic difficulties related to 
these kinds of systems : a large number of parameters have to be optimized during the 
design process or have to be controlled during the locomotion task; the intrinsic stability of 
walking machines with dynamic behaviors is not robust; the coordination of the legs is a 
complex task. When human walks, it actively uses its own dynamic effects to ensure its 
propulsion. Today, some studies exploit the dynamic effects to generate walking gaits of 
robots. In this research field, four kinds of approaches are used. The first one uses pragmatic 
rules based on qualitative studies of human walking gaits (Pratt et al., 2001) (Sabourin et al., 
2004). The second one focuses on the mechanical design of the robot in order to obtain 
natural passive dynamic gaits (Collins et al., 2005). The third one deals with theoretical 
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studies of limit cycles (Westervelt et al., 2004) (Canudas-de-Wit et al., 2002). The fourth one 
creates various dynamic walking gaits with reference trajectories (Bruneau et al., 2001) 
(Chevallereau et al., 2003). However, all theses approaches are not really universal and do 
not allow online dynamic adaptation of the robot motions as function of the environment 
and of the real capabilities of the system. 

Our objective is to carry out a more adaptive and universal approach based on the 
dynamic equations in order to generate walking gaits with a high dynamic behavior. 
Moreover, we do not wish to impose exact temporal desired trajectories. On the contrary, 
the capabilities of the robot, in term of intrinsic dynamics and actuator torques, are 
constantly evaluated and exploited as well as possible with an aim of reaching a desired 
average walking rate of the system. In order to do this, we propose a strategy based on two 
dynamic criterions, which takes into account, at every moment, the intrinsic dynamics of the 
system and the capabilities of the actuators torques in order to modify the dynamics of the 
robot directly. These criterions called the "dynamic propulsion criterion" and the "dynamic 
propulsion potential" permit to produce analytically the motions of the legs during all 
phases and all transition phases without specification of the events to perform. Even if this 
method is universal, we illustrate it for a planar under-actuated robot without foot : RABBIT 
(Buche, 2006). The particularity of this robot is that it can not be statically stable because the 
contact area between the legs and the ground is very small. Thus, a control strategy based 
on dynamic considerations is required. 

The organization of this chapter is as follows. In the second part the presentation of the 
robot Rabbit and its characteristics are given. In part three, dynamics of the biped robot 
Rabbit are introduced. Then, in part four, the two dynamic criterions are expressed : the 
"dynamic propulsion criterion" and the "dynamic propulsion potential". In part five, we 
describe the control strategy, based on these criterions, to generate highly dynamic walking 
gaits. The sixth part gives the simulation results obtained with this analytical approach. 

2. The bipedal robot RABBIT 

The robot Rabbit (fig. 1.) is an experimentation of seven French laboratories (IRCCYN 
Nantes, LAG Grenoble, LMS Poitiers, LVR Bourges, LGIPM Metz, LIRMM Montpellier, 
LRV (new name : LISV) Versailles) concerning the control of a biped robot for walking and 
running gaits generation within the framework of a CNRS project ROBEA (Sabourin et al., 
2006) (Chevallereau et al., 2003). 

This robot is composed of two legs and a trunk and has not foot. The joints are located at the 
hip and at the knee. This robot has four motors : one for each hip, one for each knee. Rabbit 
weighs 32 kilograms and measures 1 meter. Its main masses and lengths are given in table 1. 
Its motions are included in sagittal planes, by using a radial bar link fixed at a central 
column that allows to guide the direction of progression of the robot around a circle. As 
Rabbit moves only in the sagittal plane, it can be modeled in two dimensions. This robot 
represents the minimal system able to generate walking and running gaits. Since the contact 
between the robot and the ground is just one point (passive dof (degree of freedom)), the 
robot is under-actuated during the single support phase: there are only two actuators, at the 
hip and at the knee of the stance leg, to control three parameters : 

-the trunk evolution along the moving direction x at the reference point 
-the trunk evolution along the gravity direction y at the reference point 
-the trunk angle evolution around the rotation direction z 
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Fig. 1. Robot Rabbit. 





Weight (Kg) 


Length (m) 


Trunk 


12 


0,2 


Thigh 


6,8 


0,4 


Shin 


3,2 


0,4 



Table 1. Characteristics of Rabbit. 

The configuration of Rabbit is described by the parameters given in fig.2. The reference's 
point R p is selected at the hip of the robot for the definition of the position, velocity and 
acceleration of the trunk. 




Fig. 2. Model parameters of Rabbit. 
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3. Analytical dynamics of Rabbit 

The approach is based on the dynamic equations with the lagrangian form for the robot 
Rabbit at the reference point R p . 

3.1 Lagrangian Dynamic Equations 

The lagrangian motion equations, applied at R p , are written in two subsystems: 

I^ + fA + C p + G p =A; r ^ + ^ (1) 

H;a,+M k a k +C t +G t =^'l^ + D^F^ + f (2) 

where a = \x,y,q f is the trunk accelerations vector at R p , a, = [q.,q 2 ,q\,q 4 ] T the joint 
accelerations vector, F .. (j=l,2) the contacting forces applied to the feet and T the joint 

torques vector. The index "p" refers to the trunk and the index "k" to the legs. The 
subsystem (1), coming from the first three motion equations, is related to the trunk. This 
subsystem is said passive. The subsystem (2), coming from the last four motion equations, is 
related to the legs. This subsystem is said active. 

3.2 Trunk Dynamics and External Generalized Force Expression 

The trunk dynamics and the external generalized forces expression, at the reference point, 
are obtained by isolating a, in (2) and by injecting it in (1). The terms which are functions of 

the trunk configuration uniquely are isolated in the inertia matrix P M , centrifugal and 

Coriolis vector >'C and gravity vector P Q . The other terms related with legs or coupled 

between legs and trunk are grouped in the matrix k M and vectors k Q and k Q . Thus we 

can write : 

M p ="M p + k M p 

c= p c + k c with/ ^ = o (3) 

p p p 

G ="G + k G 

p p p 

The following equations system is also obtained: 

"M p a p+ "C p+ "G p = -\ k M p -H k M; i H T k \i p 

_m^ ^ =, ^ ^ ( 4 ) 



+ H k M- k \c k +G k -TpC p - k G p 



The left term of (4) represents trunk dynamics whereas the right term is the external 
generalized forces F applied to the trunk at R p . 

3.3 Computed Torque Method 

The actuator torques can be used on one hand to control the evolution of the trunk and on 
the other hand to satisfy constraints related to the locomotion (no-contact between the 
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swing leg and the ground and landing without impact). In the second case, the actuator 
torques will be function of joint trajectories. The method used to calculate these desired 
actuator torques is the computed torque method using non linear decoupling of the 
dynamics. For that, we express the actuator torques T as function of the joint accelerations 
a. by isolating a in (1) and by injecting it in (2): 



\c,.-hIm: 1 c. 



-H!M- l G 



D„ 



-A 



-H!M- l \D 



D, 



(5) 



Then, we separate (5) in two subsystems. A subsystem, named A, composed of the 
equations of the actuator torques used to generate a desired joint trajectory. A subsystem, 
named B, composed of the other equations. In the subsystem B, we isolate the joint 
acceleration, associated to the actuator torques not used to imposed a desired joint 
trajectory. In this case, if we inject their expressions in the subsystem A, this subsystem A 
express the actuator torques used to impose joint trajectories as function of the desired joint 
acceleration and of the other actuator torques. So, knowing the configuration of the robot 
and the desired joint accelerations, given by the desired joint trajectories, we can calculated 
the actuator torques. 



3.4 Close-Loop Constraint 

During the double contact phase, we have to respect the close-loop constraint along the y 

direction when we calculate the actuator torques, in order to avoid a no desired take off of a 
leg. At first, this geometric constraint is expressed as follows: 

y P 2-y P i=g(q P ,qi,q2,q3,q4) (6) 

where y v i and y P 2 are respectively the foot position of the leg 1 and the leg 2 along the y 

direction. Then, we have to express the close-loop constraint in terms of actuator torques 
constraint. For that, we derive (6) twice: 

y P i-y P \= Ai^'ii^'iAApAxAiA^A^q^Ai^At) ( 7 ) 

where y ,—y is equal to zero. Then, we replace the joint acceleration vector by its 
expression obtained with ./ = S (5): 



m,-hm: ] h, 



T-[C k -H' k M-;C p 

dTJfZ+dTJf, 



-h m;'g„ 



-hIm: 



A, F cl 



-D F 



(8) 



Thus, we obtain the close-loop constraint as function of the joint torques: 

= /(r,^„,^„,^,^,^,^ 4 ^ P ^1^2^3^4) 



(9) 



4. Dynamic criterions 

Based on the previous equations, two dynamic criterions are proposed 
propulsion criterion and the dynamic propulsion potential. 



the dynamic 
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4.1 Dynamic Propulsion Criterion 

To start, we define the dynamic propulsion criterion. The idea of this criterion is to quantify 
exactly the acceleration required instantaneously to the robot to generate the desired 
trajectory, for the three dof. We note that the desired trajectory can be expressed just as an 
average desired speed of the robot at R p . Then, knowing the configuration of the robot, the 
aim is to calculate the joint torques to generate these desired accelerations. 
To do this, we define two generalized forces. Let F be the external generalized force 
applied to the trunk at R p by the two legs and generating the real trajectory. Let F. be the 

external generalized force which should be applied to the trunk by the two legs at R p to 
generate the desired trajectory. Thus, the dynamic propulsion of the robot will be ensured if 
F is equal to F. ■ Finally, the difference F—F- represents the dynamic propulsion criterion 

that we have to minimize. Knowing the configuration of the robot, F and the expression of 
the real generalized forces (right term of eq. 4), we can calculate the desired joint torques 
T d to generate the desired trajectory: 

-A _ _ (10) 



+ H k M- k '[C k + G k 



\c k + G k -T d \> 



C 



ZV 



-H k M~ 



D„ 






To calculate F , we use the expression of the trunk dynamics obtained in left term of (4), 

where we replace the real values x , x , x , y , y , y , q , q and q of the trajectory by their 

desired values x d , x d , x d , y d , y d , y d , q d , q d and q d (in paragraph 5.1, we will see how 

obtain these desired values). The desired trajectory being defined in acceleration terms as 
well as in velocity and position terms, we replace: 

a*,=V y-i$ (11) 

by: 

x +K v \x —x)+K\x — x) 

y d +K v {y"-y)+K p {y d -y) 

<ii+K v {q d p -q p )+K p {q d -q p 
what gives us the following expression of F. '■ 

¥ i = p M p a d p + p C d +''G d (13) 



K = 



(12) 



4.2 Dynamic Propulsion Potential 

With the dynamic propulsion criterion, we know the required acceleration to generate the 
desired trajectory and the actuator torques we must apply. However, does the robot have 
the capability to generate this required acceleration? 

To answer this question, we introduce the dynamic propulsion potential P . This criterion 
represents the capability of a system to generate a desired generalized force, i.e. the 
minimum and the maximum generalized force that it can generate. 
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For a planar system (Oxy plane), P is written as follows : 

' ■■ L rx ' rx 



P = 



p 

V MzJ 



E-min. rn. 
L ry ' ry 



(14) 



V L n ' rz \J 

This criterion integrates the technology used as well as the constraints related to the 
system's design. In the case of Rabbit, we define three criterions, one for each leg (P., P.) 

and one for the robot ( P ). 

The expressions of P and P are based on the expression of the external generalized force 
generated by each leg and the field of variation of the actuator torques. The expression of the 
external generalized force performed by each leg is obtained with the decomposition of (4): 



f*=-K 



A„ 



-H^Hl 



-H kJ M, l D Jk 



k, 



+#X 



q,+<v 



r , 



g C -'G 



(15) 



where r and Y are respectively the hip and knee actuator torque of the leg j. 

Then, we have to determine the field of variation of the actuator torques. For that, the idea is 
to limit the field of variation of the actuator torques, given by the actuator technology, such 
as the joint actuators have the capability to stop the motion before reaching the joint limits. 
To determinate the limits of the field of variation of each actuator torque, we use three 
relations. The first relation (16) integrates the actuator technology. The joint velocity and the 
actuator torque give the maximum and the minimum actuator torque that the joint actuator 
can generate for the next sample time : 

{q„r,)^{rr,Tr) ( 16 ) 

The second relation (17), based on (8), gives the expression of the joint acceleration as 
function of the actuator torque: 

r, => q, (17) 

The third relation (18) gives the new joint velocity (q' +a ) and position (q' +a ), knowing the 
new joint acceleration (q' +a ) and the old angular velocity (q') and position («') : 



q, 



■ q t + q i a 



q?* =q' i +q' i a + q' i 



2 



(18) 



where St is the sample time. 

With these three relations and the maximum and minimum joint limits (g mln ,« max ), we limit 

the field of variation of the actuator torque. Here is explained the method to determinate the 
maximum torque that the joint actuator can generate, but the principle is the same for the 
minimum torque. This method is given by fig. 3. 
With (16), we determine the maximum torque r max that the technology of the joint actuator 

allows to generate. Then, we check if the joint actuator has the capability to stop the joint 
motion before reaching the joint limit « max , if we apply p max . In order to perform that, we 
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calculate, with (17) and (18), the new joint configuration that we would have obtained if we 
had applied r max • Then, in order to stop the joint motion, we apply the minimum torque 

that the joint actuator can generate in this new joint configuration. We continue this loop as 
long as the joint velocity is positive. And the minimum torque is recalculated at each step of 
time as function of the new joint configuration with (16). Then, we compare the joint 
position obtained with the joint limit g max . If it is lower than q mia , r max can be applied 
because the joint actuator has the capability to stop the joint motion before reaching the joint 
limit. Otherwise, r max is reduced until obtaining the maximum value allowing to stop the 

joint motion before reaching the joint limit. 
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Fig. 3. Determination of the real maximum torque. 

To summarize, the algorithm of fig. 3. permits to calculate the lower and upper limit of the 
effective torque that we can really apply, because it takes into account the configuration of 
the robot, the actuator technology and the joint limits: 

f,=[i^;iST] ( 19 ) 

So, knowing the field of variation of the torque that the joint actuators can generate (19) and 
using (15), the dynamic propulsion potential of each leg and the dynamic propulsion 
potential of the robot are computed. 

Related to the design of Rabbit, one joint actuator per leg has an important influence on the 
propulsion along the moving direction. It is the knee joint for the stance leg and the hip joint 
for the swing leg. So, we decide to consider only the field of variation of one actuator torque 
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per leg to calculate its dynamic propulsion potential. Consequently, the dynamic propulsion 
potential p of the leg i is: 

x ;r s ,)},max{F„(rr;rJ;F„(rr;rJ} ' 
" ; r„ )}, maxfo (rr ; r„ }, F„ (r™» ; rv )} 
- ; F gl I max{M„ (r 4 f ; T gl }, M„ (r»« ; T g , )} 



(20) 



(21) 



mmfc(rr;r 8 ,>„(r, 

min{F o .(rr;r 4 ,},F r> .(r ; 

y nm{M rz {r-;r g ,},M r M 
If the leg i is is the swing leg, and : 

minfc (r fa . ; r™ }, F„ (r„ ; r~ I max{F„ (r, ; T? }, F„ {t u ; r" )} ' 
min{F rr (r 6 , ; r? ); F ry {r„, ; r™» )}, maxfc r (r„ ; r™ }, F„ (r„ ; r~ )} 
nin{M„ (r„ ; r°" }, M r: (r„ ; r™ )} max{M„ z (r,„ ; r™ 1 " }, M r: (r„ ; r" )} 
If the leg i is the stance leg. 

Knowing the potential of each leg, we calculate the dynamic propulsion potential P of the 
robot. During the single contact phase, with no close-loop constraints to respect we have 
directly: 

P = P 1+ P 2 (22) 

During the double contact phase, we have to respect the close-loop constraint along the y 
direction. So, we limit again the field of variation of the actuator torques at the knee joint by 
taking into account this constraint. Equation (9) establishing a direct relation between the 
two actuator torques, we recalculate the minimum and the maximum torque of each joint 
actuator compatible with the minimum and the maximum torque of the other joint torque. 
This gives the new field of variation of the torque in (19) in the case of the double contact 
phase. Then, we recalculate with (21) the dynamic propulsion potential of each leg and, in 
the same way as during the single contact phase, the potential of the robot with (22). 

5. Control Strategy 

Based on the dynamic propulsion of the two legs (P ,P ) and of the robot ( P ), we are able 

to produce for a desired average walking speed (v a ) the motion during the different 
locomotion phases and to detect the necessity to perform a transition between one phase to 
another. To perform this, we have to define on one hand the trajectory of R p as function of v a 
and on the other hand we have to define the control strategy to generate this desired 
trajectory. 

5.1 Trajectory definition 

The desired trajectory is defined by v a and the desired trunk angle, is fixed equal to zero. 
With these two parameters, the nine parameters x d , x d ,x d , y d , y d , y d , q d , q d and q d 

are computed: 



y d =y 



y d =y 



y d =y 



(23) 



ff,'=0 ; «,'=-*'"*' 



a 






where a is the sample time. We note that the evolution of the height y d is free. 
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5.2 Control Strategy based on generalized forces 

We wish to generate the desired trajectory along the x direction and around the z direction. 
For that, with (13), we determine F. and M. that the robot has to perform. Then, we verify if 

the robot has the capability to generate F and M. ■ In order to do that, we compare F. to 

the lower and upper bounds of_P .If F, is included in P , the robot has the capability to 

generate F ■ Otherwise, F. is limited to the maximum force that the robot can generate along 

the x direction if F. is larger than the upper limit of P and limited to the minimum force if 

F is lower than the lower limit of P . We use the same algorithm for M ■ We note that the 

ix Fx & iz 

direction with the most important constraint is along the x direction. Thus, with the dynamic 
propulsion criterion (10) along the x direction and around the z direction, we calculate the 
actuator torques we must apply to generate F. and M . '■ 

^=/(r)=/(r M ,r gl ,r M ,r g2 ) (24) 

M,=f{T)=f{r hl ,T gl ,T h2 ,T g2 ) (25) 

However, the constraints related to the Rabbit's design and to the generation of walking 
gaits have to be taken into account. So, we complete the calculation of the desired actuator 
torques by taking into account these constraints as function of the different phases of the 
walking gaits. 

1) Double Contact Phase: During the double contact phase, we have to respect the close- 
loop constraint (9). As we have three equations ((24), (25) and (9)) to check, we use a 
generalized inverse to calculate the four desired joint torques. 

2) Transition from the Double to the Single Contact Phase: In order to not exceed the 
capability of the legs, we initiate the single contact phase when the dynamic propulsion 
potential of one leg is weak, i.e. when the force generated by the leg along or around a 
direction reaches the maximum or the minimum forces that it can generate. In the case of 
walking gaits, the dynamic propulsion potential concerned is that of the back leg along the 
x direction. 

During this phase, we have to ensure the take off the back leg. For that, we define a desired 
height for the tip of the back leg, then a trajectory to joint it. The time to reach this desired 
height is not fixed arbitrarily. We determine the shortest time to reach this height such as 
the capability of the knee actuator of the back leg is not exceeded. So, this time and 
consequently the trajectory is recalculated at each step of time. We decide to use the knee 
actuator of the back leg because it has a very weak influence on the propulsion of the robot. 
With inverse kinematics, we calculate the desired joint acceleration at the knee joint and, 
with the computed torque method, the desired actuator torque at the knee joint. 
With the three other joints, we perform the dynamic propulsion. As the swing leg has a 
weak influence on the propulsion around z , we decide to distribute F. on each leg, as 

function of their dynamic propulsion potential along the x direction, that gives us the 
following decomposition for (24): 

^i=/(r*,rj (26) 

^ 2 =/(r M ) (27) 

if the leg 1 is the stance leg and the leg 2 the swing leg. So, as we have three equations ((26), 
(27) and (25)), we can calculate the last three desired actuator torques directly. 
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3) Single Contact Phase: During this phase, we use the same strategy as during the 
transition phase. The only difference is that we use the knee actuator of the swing leg to 
preserve the desired height for the tip of the swing leg and not to join it. 

4) Transition from the Single to the Double Contact Phase: In order to not exceed the 
capability of the legs, we initiate the return to the double contact phase when the dynamic 
propulsion potential of one leg is weak. 

During this transition, we have to ensure the landing of the swing leg, which has to avoid a 
strong impact. So, we must control the trajectory of the tip of the swing leg along the x and 
y directions. In order to carry out this, we use the same strategy as this one used to control 
the height of the swing leg's tip during the single contact phase. However, we have to 
control the trajectory along two directions. Thus, we decide to use the two joint actuators of 
the swing leg. 

With the two other joint actuators we perform the dynamic propulsion. As we do not use 
the swing leg to ensure the dynamic propulsion, we compare the force and the torque, that 
the stance leg has to perform to generate p. and M. respectively, with its dynamic 

propulsion potential. If the stance leg does not have the capability to generate them, we 
limit them. So, as we have two equations ((24) and (25)), we can calculate the last two 
desired actuator torques directly. 

6. Simulation results 

With these criterions, we develop a strategy to control RABBIT and we validate it with 
the simulation of the first step of a walking gait. To start, we define the desired average 
velocity at R p . As an illustration, linear velocity % is imposed and the desired trunk 
angle q p is equal to zero. The evolution of the height y is free. So, we ensure the 
dynamic propulsion along the x direction and around the z direction only. The next 
idea is to modify, via the joint torques, the dynamic effects generated by the legs in 
order to make f converge towards p. With the equations along the x direction and 

around the z direction, we can calculate r"* directly. However, in this case, we do not 
take into account the constraints related to the robot's design and to the generation of 
walking gaits. These constraints are introduced in the next section as functions of 
locomotion phases. 

6.1 Double support phase 

During this phase, we have four joint actuators to ensure the dynamic propulsion and the 
closed loop constraint (Fig. 2. with y t f=0). As we have three equations, we use a generalized 
inverse to calculate r d . In order to not exceed the capability of the robot, we use the 
dynamic propulsion potential of each leg. We compare the force generated by the legs to 
their dynamic propulsion potential along the x direction. If a leg reaches its maximum 
capability, we decide to start its take off to perform the single support phase. 
We see that the robot generates the desired moving velocity at R p (Fig. 4.), without 
exceeding its capability nor the capability of each leg along the x direction (Fig.5). After 
1,3 seconds, the maximum capability of the back leg along the x direction reduces. Indeed, 
the knee joint of the back leg approaches its joint stop (Fig. 6). So, when the back leg 
reaches its maximum capability, the take off starts in order to perform the single support 
phase. 
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6.2 Single support phase 

During this phase, we have four joint actuators to ensure the dynamic propulsion and to 
avoid the contact between the tip of the swing leg and the ground. In order to do this, the 
swing leg's knee is used to avoid the contact while the three other joints perform the 
dynamic propulsion of the robot. 

The joint torque of the swing leg' knee is calculated with a computed torque method using 
non linear decoupling of the dynamics. The desired joint acceleration, velocity and position 
of the swing leg knee are calculated with inverse kinematics. We express the joint torque of 
the swing leg' knee as function of the three other joint torques and of the desired control 
vector components of the swing leg's knee. 

With the three other joints we perform the dynamic propulsion. It is possible that the 
robot does not have the capability to propel itself along the x direction dynamically. In 
this case, we limit the desired force with its dynamic propulsion potential. Then, we 
distribute this desired force with the dynamic propulsion potential of each leg. In order 
to keep a maximum capability for each leg, the desired force generated by each leg is 
chosen to be as further as possible from the joint actuators limits. In this case, we have 
three equations, one for the desired force along the x direction for each leg and one for 
the desired force around the z direction, to calculate the three joint torques. The joint 
torque of the swing leg's knee is replaced by its expression in function of the three other 
joint torques. So, we calculate the three joints torque performing the dynamic 
propulsion, then the joint torque avoiding the contact between the tip of the swing leg 
and the ground. 

We see that the robot can ensure the propulsion along the x direction (Fig. 7) and generates 
the desired moving velocity (single contact phase in Fig. 4.). Moreover, the control strategy 
involves naturally the forward motion of the swing leg (Fig. 8). After 1,675 seconds, the 
robot can not ensure exactly the desired propulsion along the x direction. Indeed, the swing 
leg is passed in front of the stance leg and the system is just like an inverse pendulum 
submitted to the gravity and for which rotational velocity increases quickly. 
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Fig. 7. Dynamic propulsion potential and real force of the robot along the x direction at 
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7. Conclusions and future work 

In this paper, we presented an analytical approach for the generation of walking gaits with a 
high dynamic behavior. This approach, using the dynamic equations, is based on two dynamic 
criterions: the dynamic propulsion criterion and the dynamic propulsion potential. Furthermore, 
in this method, the intrinsic dynamics of the robot as well as the capability of its joint torques are 
taken into account at each sample time. 

In this paper, in order to satisfy the locomotion constraints, for instance the no-contact 
between the tip of the swing's leg and the ground, we selected joint actuators, for instance 
the knee to avoid this contact during the single contact phase. Our future work will consist 
in determining the optimal contribution of each joint actuator by using the concept of 
dynamic generalized potential in order to satisfy at the same time the dynamic propulsion 
and the locomotion constraints. In this case, just one desired parameter will be given to the 
robot: the average speed. Thus, all the decisions concerning the transitions between two 
phases or the motions during each phase will be fully analytically defined. 
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1. Introduction 



This chapter addresses the design of a robot eye featuring the mechanics and motion 
characteristics of a human one. In particular the goal is to provide guidelines for the 
implementation of a tendon driven robot capable to emulate saccadic motions. 
In the first part of this chapter the physiological and mechanical characteristics of the eye- 
plant 1 in humans and primates will be reviewed. Then, the fundamental motion strategies 
used by humans during saccadic motions will be discussed, and the mathematical 
formulation of the relevant Listing's Law and Half-Angle Rule, which specify the geometric 
and kinematic characteristics of ocular saccadic motions, will be introduced. 
From this standpoint a simple model of the eye-plant will be described. In particular it will 
be shown that this model is a good candidate for the implementation of Listing's Law on a 
purely mechanical basis, as many physiologists believe to happen in humans. Therefore, the 
proposed eye-plant model can be used as a reference for the implementation of a robot 
emulating the actual mechanics and actuation characteristics of the human eye. 
The second part of this chapter will focus on the description of a first prototype of fully 
embedded robot eye designed following the guidelines provided by the eye-plant model. 
Many eye-head robots have been proposed in the past few years, and several of these 
systems have been designed to support and rotate one or more cameras about independent 
or coupled pan-tilt axes. However, little attention has been paid to emulate the actual 
mechanics of the eye, although theoretical investigations in the area of modeling and control 
of human-like eye movements have been presented in the literature (Lockwood et al., 1999; 
Polpitiya & Ghosh, 2002; Polpitiya & Ghosh, 2003; Polpitiya et al., 2004). 

Recent works have focused on the design of embedded mechatronic robot eye systems (Gu 
et al., 2000; Albers et al., 2003; Pongas et al., 2004). In (Gu et al., 2000), a prosthetic 
implantable robot eye concept has been proposed, featuring a single degree-of-freedom. 
Pongas et al., (Pongas et al., 2004) have developed a mechanism which actuates a CMOS 
micro-camera embedded in a spherical support. The system has a single degree-of-freedom, 
and the spherical shape of the eye is a purely aesthetical detail; however, the mechatronic 
approach adopted has addressed many important engineering issues and led to a very 



1 By eye-plant we mean the eye-ball and all the mechanical structure required for its 
actuation and support. 
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interesting system. In the prototype developed by Albers et al., (Albers et al., 2003) the 
design is more humanoid. The robot consists of a sphere supported by slide bearings and 
moved by a stud constrained by two gimbals. The relevance of this design is that it actually 
exploits the spherical shape of the eye; however, the types of ocular motions which could be 
generated using this system have not been discussed. 

In the following sections the basic mechanics of the eye-plant in humans will be described 
and a quantitative geometric model introduced. Then, a first prototype of a tendon driven 
robot formed by a sphere hold by a low friction support will be discussed. The second part 
of the chapter will described some of the relevant issues faced during the robot design. 

2. The human eye 

The human eye has an almost spherical shape and is hosted within a cavity called orbit; it 
has an average diameter ranging between 23 mm and 23.6 mm, and weighs between 7 g and 
9 g. The eye is actuated by a set of six extra-ocular muscles which allow the eye to rotate about 
its centre with negligible translations (Miller & Robinson, 1984; Robinson, 1991). 
The rotation range of the eye can be approximated by a cone, formed by the admissible 
directions of fixation, with an average width of about 76 deg (Miller & Robinson, 1984). The 
action of the extra-ocular muscles is capable of producing accelerations up to 20.000 deg sec 2 
allowing to reach angular velocities up to 800 deg sec 1 (Sparks, 2002). 

The extra-ocular muscles are coupled in agonostic/antagonistic pairs, and classified in two 
groups: recti (medial/ lateral and superior/ inferior), and obliqui (superior / inferior). The four recti 
muscles have a common origin in the bottom of the orbit (annulus of Zinn); they diverge and 
run along the eye-ball up to their insertion points on the sclera (the eye-ball surface). The 
insertion points form an angle of about 55 deg with respect to the optical axis and are placed 
symmetrically (Miller & Robinson, 1984; Koene & Erkelens, 2004). (Fig. 1, gives a qualitative 
idea of the placement of the four recti muscles.) The obliqui muscles have a more complex path 
within the orbit: they produce actions almost orthogonal to those generated by the recti, and 
are mainly responsible for the torsion of the eye about its optical axis. The superior oblique has 
its origin from the annulus of Zinn and is routed through a connective sleeve called troclea; the 
inferior oblique starts from the side of the orbit and is routed across the orbit to the eye ball. 



\55 deg 



Fig. 1. Frontal and side view of the eye: qualitative placement of recti muscles. 

Recent anatomical and physiological studies have suggested that the four recti have an 
important role for the implementation of saccadic motions which obey to the so called 
Listing's Law. In fact, it has been found that the path of the recti muscles within the orbit is 
constrained by soft connective tissue (Koornneef, 1974; Miller, 1989, Demer et al., 1995, 
Clark et al. 2000, Demer et al., 2000), named soft-pulleys. The role of the soft-pulleys to 
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generate ocular motions compatible with Listing's Law in humans and primates is still 
debated (Hepp, 1994; Raphan, 1998; Porrill et al., 2000; Wong et al, 2002; Koene & Erkelens 
2004; Angelaki, 2004); however, analytical and simulation studies suggest that the 
implementation of Listing's Law on a mechanical basis is feasible (Polpitiya, 2002; Polpitiya, 
2003; Cannata et al, 2006; Cannata & Maggiali, 2006). 

3. Saccadic motions and Listing's Law 

The main goal of the section is to introduce saccades and provide a mathematical 
formulation of the geometry and kinematics of saccadic motions, which represent the 
starting point for the development of models for their implementation. 

Saccadic motions consist of rapid and sudden movements changing the direction of 
fixation of the eye. Saccades have duration of the order of a few hundred milliseconds, 
and their high speed implies that these movements are open loop with respect to visual 
feedback (Becker, 1991); therefore, the control of the rotation of the eye during a saccade 
must depend only on the mechanical and actuation characteristics of the eye-plant. 
Furthermore, the lack of any stretch or proprioceptive receptor in extra-ocular muscles 
(Robinson, 1991), and the unclear role of other sensory feedback originated within the 
orbit (Miller & Robinson, 1984), suggest that the implementation of Listing's Law should 
have a strong mechanical basis. 

Although saccades are apparently controlled in open-loop, experimental tests show that 
they correspond to regular eye orientations. In fact, during saccades the eye orientation is 
determined by a basic principle known as Listing's Law, which establishes the amount of 
eye torsion for each direction of fixation. Listing's Law has been formulated in the mid of 
the 19* century, but it has been experimentally verified on humans and primates only 
during the last 20 years (Tweed & Vilis, 1987; Tweed & Vilis, 1988; Tweed & Vilis, 1990; 
Furman & Schor, 2003). 

Listing's Law states that there exists a specific orientation of the eye (with respect to a head 
fixed reference frame <h> = {hyhifa}), called primary position. During saccades any 
physiological orientation of the eye (described by the frame <e> = {e^eie^f), with respect to 
the primary position, can be expressed by a unit quaternion q whose (unit) rotation axis, v, 
always belongs to a head fixed plane, X The normal to plane »Tis the eye's direction of 
fixation at the primary position. Without loss of generality we can assume that e$ is the 
fixation axis of the eye, and that <h> = <e> at the primary position: then, X- spanjhi, hi}. 
Fig. 2 shows the geometry of Listing compatible rotations. 

In order to ensure that we .Tat any time, the eye's angular velocity a, must belong to a plane 
^, passing through v, whose normal, n a , forms an angle of 0/2 with the direction of fixation 
at the primary position, see Fig. 3. This property, directly implied by Listing's Law, is 
usually called Half Angle Rule, (Haslwanter, 1995). During a generic saccade the plane =?5 is 
rotating with respect to both the head and the eye due to its dependency from v and 6. This 
fact poses important questions related to the control mechanisms required to implement the 
Listing's Law, also in view of the fact that there is no evidence of sensors in the eye-plant 
capable to detect how ^is oriented. Whether Listing's Law is implemented in humans and 
primates on a mechanical basis, or it requires an active feedback control action, processed by 
the brain, has been debated among neuro-physiologists in the past few years. The evidence 
of the so called soft pulleys, within the orbit, constraining the extra ocular muscles, has 
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suggested that the mechanics of the eye plant could have a significant role in the 
implementation of Half Angle Rule and Listing's Law (Quaia & Optican, 1998; Raphan 1998; 
Porril et al., 2000; Koene & Erkelens, 2004), although counterexamples have been presented 
in the literature (Hepp, 1994; Wong et al, 2002). 




Fig. 2 Geometry of Listing compatible rotations. The finite rotation of the eye fixed frame 
<e>, with respect to <h> is described by a vector v always orthogonal to ?i3. 




Fig. 3. Half Angle Rule geometry. The eye's angular velocity must belong to the plane ^, 
passing through axis v. 



4. Eye Model 

The eye in humans has an almost spherical shape and is actuated by six extra-ocular 
muscles. Each extra-ocular muscle has an insertion point on the sclera, and is connected 
with the bottom of the orbit at the other end. Accordingly to the rationale proposed in 
(Haslwanter, 2002; Koene & Erkelens, 2004), only the four rectii extra-ocular muscles play a 
significant role during saccadic movements. In (Lockwood et al., 1989), a complete 3D model 
of the eye plant including a non linear dynamics description of the extra-ocular muscles has 
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been proposed. This model has been extended in (Polpitiya & Ghosh, 2002; Polpitiya & 
Ghosh, 2003), including also a description of the soft pulleys as elastic suspensions (springs). 
However, this model requires that the elastic suspensions perform particular movements in 
order to ensure that Listing's Law is fulfilled. The model proposed in (Cannata et al., 2006; 
Cannata & Maggiali, 2006), and described in this section, is slightly simpler than the 
previous ones. In fact, it does not include the dynamics of extra-ocular muscles, since it can 
be shown that it has no role in implementing Listing's Law, and models soft pulleys as fixed 
pointwise pulleys. As it will be shown in the following, the proposed model, for its 
simplicity, can also be used as a guideline for the design of humanoid tendon driven robot 
eyes. 

4.1 Geometric Model of the Eye 

The eye-ball is assumed to be modeled as a homogeneous sphere of radius R, having 3 
rotational degrees of freedom about its center. Extra-ocular muscles are modeled as non- 
elastic thin wires (Koene & Erkelens, 2004), connected to pulling force generators (Polpitiya 
& Ghosh, 2002). Starting from the insertion points placed on the eye-ball, the extra-ocular 
muscles are routed through head fixed pointwise pulleys, emulating the soft-pulley tissue. 
The pointwise pulleys are located on the rear of the eye-ball, and it will be shown that 
appropriate placement of the pointwise pulleys and of the insertion points has a 
fundamental role to implement the Listing's Law on a purely mechanical basis. 
Let O be the center of the eye-ball, then the position of the pointwise pulleys can be 
described by vectors /?,, while, at the primary position insertion points can be described by 
vectors c,-, obviously assuming that | C; | = R. When the eye is rotated about a generic axis v 
by an angle 8, the position of the insertion points can be expressed as: 

r i = R(v,0)e i Vz = 1...4 (1) 

where R(v, 6) is the rotation operator from the eye to the head coordinate systems. 
Each extra-ocular muscle is assumed to follow the shortest path from each insertion point to 
the corresponding pulley, (Demer et al., 1995); then, the path of the each extra-ocular 
muscle, for any eye orientation, belongs to the plane defined by vectors n and /?,-. Therefore, 
the torque applied to the eye by the pulling action r, > 0, of each extra-ocular muscle, can be 
expressed by the following formula: 

m : =r^^ V/ = 1...4 (2) 



\ r i*Pi\ 




Fig. 4. The relative position of pulleys and insertion points when the eye is in the primary position. 
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From expression (2), it is clear that | p, 1 does not affect the direction or the magnitude of m, 
so we can assume in the following that | pi \ = | c,- 1 . Instead, the orientation of the vectors pt, 
called principal directions, are extremely important. In fact, it is assumed that p, and d are 
symmetric with respect to the plane X; this condition implies: 

{v- Ci ) = (v Vi ) Vi = 1...4,V»6j (3) 

Finally, it is assumed that insertion points are symmetric with respect to the fixation axis: 

(h 3 -c i ) = (h 3 -c j ) Vi,j = l..A (4) 

and 

{c 3 - Cl )-{c 4 -c 2 )=0 V/J-1...4 (5) 

4.2 Properties of the Eye Model 

In this section we review the most relevant properties of the proposed model. First, it is 
possible to show that, for any eye orientation compatible with Listing's Law, all the torques 
m, produced by the four rectii extra-ocular muscles belong to a common plane passing 

through the finite rotation axis v e X, see (Cannata et al., 2006) for proof. 

Theorem 1: Let v ejbe the finite rotation axis for a generic eye orientation, then there exists 

a plane J^Tpassing through v such that: 

m ; e^T Vz' = 1...4 
A second important result is that, at any Listing compatible eye's orientation, the relative 
positions of the insertion points and pointwise pulleys form a set of parallel vectors, as 
stated by the following theorem, see (Cannata et al., 2006) for proof. 

Theorem 2: Let v ejbe the finite rotation axis for a generic eye orientation, then: 

(r,.-p ; )x(r.-p.)=0 V/,j=1...4 
Finally, it is possible to show that planes JJiTand <f£, are coincident, see (Cannata et al., 2006) 
for proof. 
Theorem 3: Let v ejbe the finite rotation axis for a generic eye orientation, then: 

111,6^ V/ = 1...4 
Remark 1: Theorem (3) has in practice the following significant interpretation. For any 
Listing compatible eye orientation any possible torque applied to the eye, and generated 
using only the four rectii extra-ocular muscles, must lay on plane !%. 

The problem now is to show, according to formula (2), when arbitrary torques m, e ^ can 
be generated using only pulling forces. Theorem 2 and theorem 3 imply that ttli are all 
orthogonal to the vector n a , normal to plane tfa. Therefore, formula (2) can be rewritten as: 

4 

f = -" w x(2> r ;) ^ 

where! 



r<=\ 



r >0 V/ = 1...4 ( 7 ) 



k xr. 



take into account the actual pulling forces generated by the extra-ocular muscles. From 
formula (6), it is clear that t is orthogonal to a convex linear combination of vectors r;. Then, 
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it is possible to generate any torque vector laying on plane <?q, as long as n a belongs to the 
convex hull of vectors r; as shown in Fig. 5. 




Fig. 5. When vector n a belongs to the convex hull of vectors r,- then rectii extra-ocular 
muscles can generate any admissible torque on P a . 

Remark 2: The discussion above shows that the placement of the insertion points affects the 
range of admissible motions. According to the previous discussion when the eye is in its 
primary position any torque belonging to plane Xcan be assigned. The angle /? formed by 
the insertion points with the optical axis determines the actual eye workspace. For an angle 
P= 55 deg the eye can rotate of about 45 deg in all directions with respect to the direction of 
fixation at the primary position. 

Assume now that, under the assumptions made in section 3, a simplified dynamic model of 
the eye could be expressed as: 

Id)=T (8) 

where I is a scalar describing the momentum of inertia of the eye-ball, while a is its angular 
acceleration of the eye. Let us assume at time the eye to be in the primary position, with 
zero angular velocity (zero state). Then, the extra-ocular muscles can generate a resulting 
torque of the form: 

T=V0(t) (9) 

where v e X is a constant vector and 6(t) a scalar control signal. Therefore, a and a are 
parallel to v; then, it is possible to reach any Listing compatible orientation, and also, during 
the rotation, the Half Angle Rule is satisfied. Similar reasoning can be applied to control the 
eye orientation to the primary position starting from any Listing compatible orientation and 
zero angular velocity. 

The above analysis proves that saccadic motions from the primary position to arbitrary 
secondary positions can be implemented on a mechanical basis. However, simulative 
examples, discussed in (Cannata & Maggiali, 2006), show that also generic saccadic motions 
can be implemented adopting the proposed model. Theoretical investigations on the model 
properties are currently ongoing to obtain a formal proof of the evidence provided by the 
simulative tests. 
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5. Robot Eye Design 

In this section we will give a short overview of a prototype of humanoid robot eye designed 
following the guidelines provided by the model discussed in the previous section, while in 
the next sections we will discuss the most relevant design details related to the various 
modules or subsystems. 

5.1 Characteristics of the Robot 

Our goal has been the design of a prototype of a robot eye emulating the mechanical 
structure of the human eye and with a comparable working range. Therefore, the first and 
major requirement has been that of designing a spherical shape structure for the eye-ball 
and to adopt a tendon based actuation mechanism to drive the ocular motions. The model 
discussed in the previous sections allowed to establish the appropriate quantitative 
specifications for the detailed mechanical design of the system. 

At system level we tried to develop a fairly integrated device, keeping also into account the 
possible miniaturization of the prototype to human scale. The current robot eye prototype 
has a cylindrical shape with a diameter of about 50 mm and an overall length of about 100 
mm, Fig. 6; the actual eye-ball has a diameter of 38.1 mm (i.e. about 50% more than the 
human eye). These dimensions have been due to various trade-offs during the selection of 
the components available off-the-shelf (e.g. the eye-ball, motors, on board camera etc.), and 
budget constraints. 



5.2 Components and Subsystems. 

The eye robot prototype consists of various components and subsystems. The most relevant, 
discussed in detail in the next sections, are: the eye-ball, the eye-ball support, the pointwise 
pulleys implementation, the actuation and sensing system, and the control system architecture. 
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Fig. 6. Outline of the robot eye. 

The design of the eye-ball and its support structure, has been inspired by existing ball 
transfer units. To support the eye-ball it has been considered the possibility of using thrust- 
bearings, however, this solution has been dropped since small and light components for a 
miniature implementation (human sized eye), were not available. The final design has been 
based on the implementation of a low friction (PTFE) slide bearing, which could be easily 
scaled to smaller size. 

The actuation is performed by tendons, i.e. thin stiff wires, pulled by force generators. The 
actuators must provide a linear motion of the tendons with a fairly small stroke (about 30 
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mm, in the current implementation, and less than 20 mm for an eye of human size), and 
limited pulling force. In fact, a pulling force of 2.5 N would generate a nominal angular 
acceleration of about 6250 rad sec 2 , for an eye-ball with a mass 50 g and radius of 20 mm, 
and about 58000 rad sec 2 in the case of an eye of human size with a mass of 9 g and radius of 
12 mm. The actuators used in the current design are standard miniature DC servo motors, 
with integrated optical encoder, however, various alternative candidate solutions have been 
taken into account including: shape memory alloys and artificial muscles. According to 
recent advances, (Carpi et al., 2005; Cho & Asada, 2005), these technologies seem very 
promising as alternative solutions to DC motors mostly in terms of size and mass (currently 
the mass of the motors is about 160 g, i.e. over 50% of the total mass of the system, without 
including electronics). However, presently both shape memory alloys and artificial muscles 
require significant engineering to achieve operational devices, and therefore have not be 
adopted for the first prototype implementation. 
In the following the major components and subsystems developed are reviewed. 

6. The Eye-Ball 

The eye ball is a precision PTFE sphere having a diameter of 38.1 mm (1.5m). The sphere has 
been CNC machined to host a commercial CMOS camera, a suspension spring, and to route 
the power supply and video signal cables to the external electronics. A frontal flange is used 
to allow the connection of the tendons at the specified insertion points, and to support 
miniature screws required to calibrate the position of the camera within the eye ball. On the 
flange it is eventually placed a spherical cover purely for aesthetical reasons. Fig. 7 and Fig. 
8 show the exploded view and the actual eye-ball. 
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Fig. 7. Exploded view of the eye-ball. 




Fig. 8. The machined eye-ball (left), and the assembled eye-ball (camera cables shown in 
background). 
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The insertion points form an angle of 55 cleg, with respect to the (geometric) optical axis of 
the eye, therefore the eye-ball can rotate of about 45 cleg in all directions. The tendons used 
are monofiber nylon coated wires having a nominal diameter of 0.25 mm, well 
approximating the geometric model proposed for the extra-ocular muscles. 

7. Supporting Structure 

The structure designed to support the eye ball is formed by two distinct parts: a low friction 
support, designed to hold the eye ball, Fig. 9, and a rigid flange used to implement the 
pointwise pulleys, and providing appropriate routing of the actuation tendons) required to 
ensure the correct mechanical implementation of Listing's Law, Fig. 13. 




Fig. 9. CAD model of the eye-ball support (first concept). 

7.1 The Eye-Ball Support 

The eye-ball support is made of two C-shaped PTFE parts mated together, Fig. 10. The rear 
part of the support is drilled to allow the routing of the power supply and video signal 
cables to and from the on board camera. 

The eight bumps on the C-shaped parts are the actual points of contact with the eye-ball. The 
placement of the contact points has been analysed by simulation in order to avoid interference 
with the tendons. Fig. 11 shows the path of one insertion point when the eye is rotated along 
the boundary of its workspace (i.e. the fixation axis is rotated to form a cone with amplitude of 
45 cleg). The red marker is the position of the insertion point at the primary position while the 
green markers represent the position of two frontal contact points. The north pole in the figure 
represents the direction of axis h^. The frontal bumps form an angle of 15 cleg with respect to 
the equatorial plane. The position of the rear bumps is constrained by the motion of the camera 
cables coming out from the eye-ball. To avoid interferences the rear bumps form an angle of 35 
cleg with respect to equatorial plane of the eye. 



7.2 The Pointwise Pulleys 

The rigid flange, holding the eye-ball support, has the major function of implementing the 
pointwise pulleys. The pulleys have the role of constraining the path of the tendons so that, 
at every eye orientation, each tendon passes through a given head fixed point belonging to 
the principal direction associated with the corresponding pointwise pulley. 
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Let us assume the eye in a Listing compatible position A, then we may assume that a generic 
tendon is routed as sketched in Fig. 12. The pulley shown in the figure is tangent to the 
principal direction at point p„ and it allows the tendon to pass through p^ Assume now to 
rotate the eye to another Listing compatible position B; if the pulley could tilt about the 
principal axis, during the eye rotation, the tangential point p, would remain the same so that 
the tendon is still routed through point p,-. Therefore, the idea of the pulley tilting (about the 
principal axis) and possibly rotating (about its center), fully meets the specifications of the 
pointwise pulleys as defined for the eye model. 




Fig. 10. The eye-ball support is formed by two PTFE parts mated together (final design) 




Fig. 11. During Listing compatible eye motions, the insertion points move within the region 
internal to the blue curve. The red marker represents the position of the insertion point at the 
primary position, while the green markers are the positions of (two of) the frontal contact 
points on the eye-ball support. The fixation axis at the primary position, /13, points upward. 
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Pulley at eye position B 



Tendon routing at eve position B 



Common tendon routing 




Pulley at eye position A 



Tendcn routing at eye position A 



Fig. 12. Sketch of the tendon's paths, showing the tilting of the routing pulley when the eye 
is rotated from position A to position B (tendons and pulleys diameters not to scale). 



PULLEY 




Fig. 13. Detail of the flange implementing the pointwise pulleys. The tendon slides along a 
section of a toroidal surface. 

A component featuring these characteristics could be implemented, but its miniaturization 
and integration has been considered too complex, so we decided to implement a virtual 
pulley, to be intended as the surface formed by the envelope of all the tilting pulleys for all 
the admissible eye orientations. Since the pulley tilts about the principal axis at point p;, then 
the envelope is a section of a torus with inner diameter equal to the radius of the tendon, 
and external radius equal to the radius selected for the pulley. Then, the implementation of 
the virtual pulley has been obtained by machining a section of a torus on the supporting 
flange as shown in Fig. 13. 
The assembly of the eye-ball and its supporting structure is shown in Fig. 14. 
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8. Sensing and Actuation 

The robot-eye is actuated by four DC servo-motors, producing a maximum output torque of 
12 mNm, and pulling four independent tendons routed to the eye as discussed in the 
previous section. The actuators are integrated within the structure supporting the eye, as 
shown in Fig. 15 and Fig. 16. 




Fig. 14. The eye-ball and its supporting parts. 




v 1 
Fig. 15. Four DC motor actuate the tendons; pulleys route the tendons towards the eye-ball 

The servo-motors are equipped with optical encoders providing the main feedback for the 
control of the ocular movements. A second set of sensors for measuring the mechanical 
tension of the tendons is integrated in the robot. In fact, as the tendons can only apply 
pulling forces, control of the ocular movements can be properly obtained only if slackness of 
the tendons or their excessive loading is avoided. The tension sensors are custom made and 
integrated within the supporting structure of the eye, Fig. 17. 

Each sensor is formed by an infrared led/photodiode couple separated by a mobile shutter, 
preloaded with a phosphore-bronze spring. As shown in Fig. 18, the tension of the tendon 
counter-balances the pre-load force thus varying the amount of IR radiation received. The 
sensor output is the current generated by the photodiode according to the following 
equation: 



I p = fe pr (/)E0 



(10) 



where I p is the current generated by the photodiode, fc p is the characteristic parameter of the 
photodiode, E0 is the IR radiation emitted by the led, and y(f) is a monotonic non-linear 
function of the tendon's tension depending on the system geometry. Each sensor is 
calibrated and a look-up table is used to map its current to tension characteristic. 
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Fig. 16. The body of the MAC-EYE robot. All the motors are integrated within the body; 
furthermore, all the cables from and to the eye-ball and the embedded optical tension 
sensors are routed inside of the structure. 




Fig. 17. Implementation of the embedded sensors for measuring the mechanical tension of 
the tendons. The picture shows the shutter and its preloading spring, cables and tendons. 



9. Robot Control System 

9.1 The control architecture 

The control architecture is implemented as a two level hierarchical system. At low level are 
implemented two control loops for each actuator. In the first loop a P-type control action 

regulates the tension of the i-th tendon at some constant reference value /, , while in the 

second loop a Pi-type action controls the motor velocity as specified by signal q * , see Fig. 
19. The tension feedback control loop makes the eye-ball backdrivable, so the eye can be 
positioned by hand. Both the reference signals q] and /, are generated by the higher level 
control modules which implement a position based Pi-type controller. 
Currently, the major task implemented is that of controlling the eye position during 

emulated saccadic movements. Therefore, coordinated signals g, and^, must be generated 
for all the actuators. In order to compute the appropriate motor commands a geometric and 
a kinematic eye model are implemented as described below. 
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Fig. 18. Sketch of the tension sensor (not to scale). 
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Fig. 19. Robot eye control scheme. The picture shows the control loops related to a single 
actuator. 



Assume that a given reference trajectory for the eye is given and expressed by a rotation 
matrix R (t) , and an angular velocity CO (t) . Then, an algebraic and a differential mapping, 
relating the eye orientation to the displacement of the tendons, can be easily computed. In 
fact, as shown in Fig. 20, for a given eye orientation the tendon starting from the insertion 
point r, remains in contact with the eye-ball up to a point t, (point of tangency). From t, the 
tendon reaches the pointwise pulley and then moves towards the motor which provides the 
actuation force. For a given position of the eye there exists a set of displacements of the free 
end of the tendons corresponding to the amount of rotation to be commanded to the motors. 
According to Fig. 20, for each tendon this amount can be computed (with respect to a 
reference eye position, e.g. the primary position), using the formula: 

Xi=R(^-0oi) (11) 
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Pointwise pulley i 




Insertion point i 



Fig. 20. The plane of tendon i for a generic eye orientation. Angle a, is constant for any eye 
orientation. 

where Xi is the amount of displacement of the free end of the tendon, while $ and tfci are the 
angles (with positive sign), formed by vectors r, and t, at a generic eye orientation, and at the 
primary position, respectively. Signs are chosen so that if Xi < then the corresponding 

tendon must be pulled in order to orient the eye as specified by matrix R (t) . In order to 

compute Xi the angle $can be determined, as follows. According to Fig. 20, the angle oc, must 
be constant for any eye orientation and can be expressed as: 

a^cos-'fj] (12) 

If the eye orientation is known with respect to frame <h>, then r, is known, hence: 

Rd i cos(a,. + <j>i) = r, ■ p, (13) 

and finally, from equations (12) and (13), we obtain: 

The time derivative of $ can be computed by observing that r, =coxr i , where co is the 
angular velocity of the eye, then, the time derivative of equations (13) can be written as: 

-jrin -Pi)= -Rd t sin(a ; + fa )ft (15) 

at 

Therefore, we obtain the following equality: 

{cox fi )• pi = -Rdi sin(a, + fc )fc (16) 

Then, by observing that (CO x f ; ) ■ p; = (r,- x p,- ) ■ co , we have: 

■ 1 



r (r,xp,).» (17) 

|r,-xp,-| 

Then, if R (t) and co (t) are the desired trajectory and angular velocity of the eye, the 
reference motor angles and velocities can be computed using formulas (11), (14) and (17) as: 



Design of a Humanoid Robot Eye 



153 



<?;« = 



-i 



( * A 

»V Pi 



Rdj 
v y 



-7 R 



'?/(')--- — n; — rW x P»)-' 



(18) 



R 



m \r j xp { 



where R„, is the radius of the motor pulley. 



9.2 The control architecture 

The computer architecture of the robot eye is sketched in Fig. 23. A PC based host computer 
implements the high level position based control loop and the motor planning algorithm 
(18). Currently the high level control loop runs at a rate of 125 Hz. 

The low level control algorithms are implemented on a multi-controller custom board, Fig. 
21, featuring four slave micro-controllers (one for each motor), operating in parallel and 
coordinated by a master one managing the communications through CAN bus with higher 
level control modules or other sensing modules (e.g. artificial vestibular system). The master 
and slave microcontrollers communicates using a multiplexed high speed (10 Mbits) serial 
link and operate at a rate of 1.25KHz. 




Fig. 21. The prototype custom embedded real-time controller. The board has dimensions of 
69 x 85mm 2 . 



V 




Fig. 22. Complete stereoscopic robot system 
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Fig. 23. Sketch of the computer control architecture. 

10. Conclusions 

This chapter has shown the feasibility of a tendon driven robot eye featuring the 
implementation of Listing's Law on a mechanical basis. The design of the robot is based on a 
model which is strongly inspired on assumptions derived from physiological evidence. 
The achievements discussed in this chapter represent the starting point for the 
development of a more complex humanoid robot eye. From the mechanical point of 
view the most important advancement is represented by the possibility of extending the 
actuation to six tendons in order to enable the implementation of more complex ocular 
motions as the vestibulo- ocular reflex. From a theoretical point of view a more complete 
analysis of the properties of the proposed eye model could provide further 
understanding of the dynamics of saccadic motions. Furthermore, other issues such as 
eye miniaturization and embedding of image processing algorithms modules for direct 
visual feedback represent important challenges for the development of fully operational 
devices. 
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1. Introduction 

Humanoid robots, because of their similar structure with humans, are expected to operate in 
hazardous and emergency environments. In order to operate in such environments, the 
humanoid robot must be highly autonomous, have a long operation time and take decisions 
based on the environment conditions. Therefore, algorithms for generating in real time the 
humanoid robot gait are central for development of humanoid robot. 

In the early works, the humanoid robot gait is generated based on the data taken from 
human motion (Vukobratovic et al. 1990). Most of the recent works (Roussel 1998, Silva & 
Machado 1998, Channon 1996) consider minimum consumed energy as a criterion for 
humanoid robot gait generation. Roussel (1998) considered the minimum consumed energy 
gait synthesis during walking. The body mass is concentrated on the hip of the biped robot. 
Silva & Machado (1998) considered the body link restricted to the vertical position and the 
body forward velocity to be constant. The consumed energy, related to the walking velocity 
and step length, is analyzed by Channon (1996). The distribution functions of input torque 
are obtained by minimizing the joint torques. 

In our previous works, we considered the humanoid robot gait generation during walking 
and going up-stairs (Capi et al. 2001) and a real time gait generation (Capi et al. 2003). In 
addition of minimum consumed energy (MCE) criteria, minimum torque change (MTC) 
(Uno et al. 1989, Nakano et al. 1999) was also considered. The results showed that MCE and 
MTC gaits have different advantages. Humanoid robot motion generated based on MCE 
criterion was very similar with that of humans. Another advantage of MCE criterion is the 
long operation time when the robot is actuated by a battery. On the other hand, MTC 
humanoid robot motion was more stable due to smooth change of torque and link 
accelerations. 

Motivated from these observations, it will be advantageous to generate the humanoid robot 
motion such that different criteria are satisfied. This belongs to a multiobjective optimization 
problem. In a multiobjective optimization problem there may not exist one solution that is 
the best with respect to all objectives. Usually, the aim is to determine the tradeoff surface, 
which is a set of nondominated solution points, known as Pareto-optimal or noninferior 
solutions. 

The multiobjective problem is almost always solved by combining the multiple objectives 
into one scalar objective using the weighting coefficients. Therefore, to combine different 



158 Humanoid Robots, New Developments 

objectives in a single fitness function, an a-priori decision is needed about the relative 
importance of the objectives, emphasizing a particular type of solution. These techniques 
often require some problem-specific information, such as total range each objective covers. 
In complex problems, such as humanoid robot gait generation, this information is rarely 
known in advance, making the selection of single objective weighting parameters difficult. 
In addition, there is no rational basis of determining adequate weights and the objective 
function so formed may lose significance due to combining non-commensurable objectives. 
To avoid this difficulty, the e-constraint method for multiobjective optimization was 
presented (Becerra & Coello). This method is based on optimization of the most preferred 
objective and considering the other objectives as constraints bounded by some allowable 
levels. These levels are then altered to generate the entire Pareto-optimat set. The most 
obvious weaknesses of this approach are that it is time-consuming and tends to find weakly 
nondominated solutions. 

In this paper, we present a multiobjective evolutionary algorithm (MOEA) (Coello 1999, 
Herrera et al. 1998) technique for humanoid robot gait synthesis. The main advantage of the 
proposed algorithm is that in a single run of evolutionary algorithm, humanoid robot gaits 
with completely different characteristics are generated. Therefore, the humanoid robot can 
switch between different gaits based on the environment conditions. In out method, the 
basic idea is to encode the humanoid robot gait parameters in the genome and take the 
parameters of the non-dominated optimal gaits in the next generation. The specific 
questions we ask in this study are: 1) whether MOEA can successfully generate the 
humanoid robot gait that satisfies different objective functions in a certain degree, 2) 
whether the humanoid robot gait optimized by MOEA in simulation can indeed be helpful 
in hardware implementation. 

In order to answer these questions, we considered the MCE and MTC cost functions as 
criteria for "Bonten-Maru" humanoid robot gait synthesis. We employed a real number 
MOEA. Simulation and experimental results show a good performance of the proposed 
method. The non-dominated optimal Pareto optimal solutions have a good distribution and 
humanoid robot gait varies from satisfying each of both considered objectives to satisfying 
both of them. Therefore, as a specific contribution of proposed method is that in a single run 
of MOEA are generated humanoid robot gaits with completely different characteristics, 
making it possible to select the appropriate gait based on our preferences. In order to further 
verify how the optimized gait will perform on real hardware, we implemented the optimal 
gait using the "Bonten-Maru" humanoid robot. The results show that in addition of energy 
consumption, the optimized gait was stable and with a small impact due to the smooth 
change of the joint torques. 

2. Multiobjective Evolutionary Algorithm 

2.1 Multiobjective Optimization Problem 

In multiobjective optimization problems there are many (possibly conflicting) objectives to be 
optimized, simultaneously. Therefore, there is no longer a single optimal solution but rather a 
whole set of possible solutions of equivalent quality. In contrast to fully ordered scalar search 
spaces, multidimensional search spaces are only partially ordered, i.e. two different solutions 
are related to each other in two possible ways: either one dominates the other or none of them 
is dominated. Consider without loss of generality the following multiobjective maximization 
problem with m decision variables x (parameters) and n objectives: 
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y = f(x) = (/1O1, x m ),..., 

/«(*1> X m )) 

where x = (x l , x m ) e X , y = (j>j, y n ) e 7 and where x is called decision (parameter) 

vector, X parameter space, y objective vector and Y objective space. A decision vector a EX 
is said to dominate a decision vector b £X (also written as a >b) if and only if: 

Vie{l,...., »}:/,(«) > f,(b) a ' ' (2) 

3/ e {1,...., n}:fj{a)> fj{b) 
The decision vector a is called Pareto-optimal if and only if a is nondominated regarding the 
whole parameter space X. Pareto-optimal parameter vectors cannot be improved in any 
objective without causing degradation in at least one of the other objectives. They represent 
in that sense globally optimal solutions. Note that a Pareto-optimal set does not necessarily 
contain all Pareto optimal solutions in X. The set of objective vectors corresponding to a set 
of Pareto-optimal parameter vectors is called "Pareto-optimal front". 

In extending the ideas of SOEAs to multiobjective cases, two major problems must be 
addressed: — How to accomplish fitness assignment and selection in order to guide the 
search towards the Pareto-optimal set? — How to maintain a diverse population in order to 
prevent premature convergence and achieve a well distributed, wide spread trade-off front? 
Note that the objective function itself no longer qualifies as fitness function since it is a 
vector valued and fitness has to be a scalar value. Different approaches to relate the fitness 
function to the objective function can be classified with regard to the first issue. The second 
problem is usually solved by introducing elitism and intermediate recombination. Elitism is 
a way to ensure that good individuals do not get lost (by mutation or set reduction), simply 
by storing them away in an external set, which only participates in selection. Intermediate 
recombination, on the other hand, averages the parameter vectors of two parents in order to 
generate one offspring. 

2.2 Nondominated Sorting Genetic Algorithm 

NSGA was employed to evolve the neural controller where the weight connections are 
encoded as real numbers. Dias & Vasconcelos (2002) compared the NSGA with four others 
multiobjective evolutionary algorithms using two test problems. The NSGA performed 
better than the others did, showing that it can be successfully used to find multiple Pareto- 
optimal solutions. In NSGA, before selection is performed, the population is ranked on the 
basis of domination using Pareto ranking, as shown in Fig. 1. All nondominated individuals 
are classified in one category with a dummy fitness value, which is proportional to the 
population size (Srivinas, & Deb 1995). After this, the selection, crossover, and mutation 
usual operators are performed. 

In the ranking procedure, the nondominated individuals in the current population are first 
identified. Then, these individuals are assumed to constitute the first nondominated front 
with a large dummy fitness value (Srivinas, & Deb 1995). The same fitness value is assigned 
to all of them. In order to maintain diversity in the population, a sharing method is then 
applied. Afterwards, the individuals of the first front are ignored temporarily and the rest of 
population is processed in the same way to identify individuals for the second 
nondominated front. A dummy fitness value that is kept smaller than the minimum shared 
dummy fitness of the previous front is assigned to all individuals belonging to the new 
front. This process continues until the whole population is classified into nondominated 
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fronts. Since the nondominated fronts are defined, the population is then reproduced 
according to the dummy fitness values. 



Start 
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Fig. 1. Flowchart of NSGA. 

Fitness Sharing: In genetic algorithms, sharing techniques aim at encouraging the formation 
and maintenance of stable subpopulations or niches (Zitzler et al. 2000). This is achieved by 
degrading the fitness value of points belonging to a same niche in some space. 
Consequently, points that are very close to, with respect to some space (decision space X in 
this paper), will have their dummy fitness function value more degraded. The fitness value 
degradation of near individuals can be executed using (3) and (4), where the parameter dy is 
the variable distance between two individuals i and j, and a s imred is the maximum distance 
allowed between any two individuals to become members of a same niche. In addition, dfi is 
the dummy fitness value assigned to individual i in the current front and dfi is its 
corresponding shared value. N pop is the number of individuals in the population. The 
sharing function (Sh) measures the similarity level between two individuals. The effect of 
this scheme is to encourage search in unexplored regions. For details about niching 
techniques, see Sareni et al. (1998). 
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3. Optimal Gait Generation 

During motion, the arms of the humanoid robot will be fixed on the chest. Therefore, it 
can be considered as a five-link biped robot in the saggital plane, as shown in Fig. 2. 
The motion of the biped robot is considered to be composed from a single support 
phase and an instantaneous double support phase. The friction force between the 
robot's feet and the ground is considered to be great enough to prevent sliding. During 
the single support phase, the ZMP must be within the sole length, so the contact 
between the foot and the ground will remain. In our work, we calculate the ZMP by 
considering the link mass concentrated at one point. To have a stable periodic walking 
motion, when the swing foot touches the ground, the ZMP must jump in its sole. This is 
realized by accelerating the body link. To have an easier relative motion of the body, the 
coordinate system from the ankle joint of the supporting leg is moved transitionally to 
the waist of the robot (O1X1Z1). Referring to the new coordinate system, the ZMP 
position is written as follows: 

5 5 

^rriiCz; +z w +g z )x ; -^m.fx; + x w )(z ; +z w ) 

X ZMP=~ 5 ~ . () 

^]m i (^ i +z w +g z ) 

i=i 



where mi is mass of the particle "i" , x w and z w are the coordinates of the waist with respect 
to the coordinate system at the ankle joint of supporting leg, x ( and z ; are the coordinates 

of the mass particle "i" with respect to the O1X1Z1 coordinate system, x. and z. are the 
acceleration of the mass particle "i" with respect to the O1X1Z1 coordinate system. 
Based on the formula (3), if the position, 7L i ,z i , and acceleration, X;,z ; , of the leg part 
(i=l,2,4,5), the body angle, 9 3 , and body angular velocity, 8 3 , are known, then because 

x 3 ,z 3 are functions of 13,93,83,83, it is easy to calculate the body angular acceleration 
based on the ZMP position. Let (0) and (f) be the indexes at the beginning and at the end of 
the step, respectively. At the beginning of the step, 8 30 causes the ZMP to be in the position 
ZMPj ump . At the end of the step, the angular acceleration 8 & is calculated in order to have 

the ZMP at the position ZMPf, so that the difference between 8 3f and 8 30 is minimal. 
Therefore, the torque necessary to change the acceleration of the body link will also be 
minimal. 

3.1 Objective Functions 

The gait synthesis problem, with respect to walking or going up-stairs, consists on finding 
the joint angle trajectories, to connect the first and last posture of the biped robot for 
which the consumed energy and torque change are minimal. For the MCE cost function, it 
can be assumed that the energy to control the position of the robot is proportional to the 
integration of the square of the torque with respect to time, because the joint torque is 
proportional with current. Therefore, minimizing the joint torque can solve the MCE 
problem (Capi 2002). 
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Fig. 2. Five-link humanoid robot. 

The cost function J, which is a quantity proportional to the energy required for the motion, is 

defined as follows: 



i tf 



T Tdt + A^ ump At+j*Cdt), 



J = -(JT-Tdt + Axr ump At+ j(Jdt), (6) 

1 o o 

where: tf is the step time, T is the torque vector, Ax ■ and At are the addition torque 

applied to the body link to cause the ZMP to jump and its duration time, and C is the 
constraint function, given as follows: 

fO - if the constraints are satisfied, 
C = 

I Cj - if the constraints are not satisfied, 

c denotes the penalty function vector. We consider the following constraints for our system. 

1) The walking to be stable or the ZMP to be within the sole length. 

2) The distance between the hip and ankle joint of the swing leg must not be longer then 
the length of the extended leg. 

3) The swing foot must not touch the ground prematurely. 

The torque vector is calculated from the inverse dynamics of the five-link biped robot as : 

J(0)0 + X(0)0 2 +Y0 + 2(0) = z. (7) 

where J(0) is the mass matrix (5x5), X{6) is the matrix of centrifugal coefficients (5x5), Y 
is the matrix of Coriolis coefficients (5x5), Z(6) is the vector of gravity terms (5x1), T is the 

generalized torque vector (5x1), and 6, 6, 6 are 5x1 vectors of joint variables, joint angular 
velocities and joint angular accelerations, respectively. 

The MTC model (Uno 1989, Nakano 1999) is based on smoothness at the torque level. The 
cost is the integrated squared torque change summed over the joints and the movement. In 
the MTC, the objective function to be minimized is expressed by: 
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4. Boundary Conditions and GA Variables 

To have a continuous periodic motion, the humanoid robot posture has to be the same at the 
beginning and at the end of the step. Therefore, the following relations must be satisfied: 

9 10= 9 51/ 6 20= 6 4f/ 9 lt~ 9 50/ 9 2f= 9 40, 9 30 = 6 3f- (9) 

In order to find the best posture, the optimum value of 6 io, 6 20 and 9 30 must be 
determined by GA. For a given step length, it is easy to calculate 9 40 and 9 50. When 
referring to Figure 2, it is clear that links 1, 2, 4 at the beginning of the step and links 2, 4, 5 at 
the end of the step, change the direction of rotation. Therefore, we can write: 

9 10= 9 20= 9 40= 9 2f= 9 4f= 9 5f=0. (10) 

The angular velocity of link 1 at the end of the step and link 5 at the beginning of the step is 
considered to be the same. In order to find the best value of angular velocity, we consider it 
as one variable of GA, because the rotation direction of these links does not change. GA will 
determine the optimal value of the angular velocity of the body link, which is considered to 
be the same at the beginning and at the end of the step. The following relations are 
considered for the angular acceleration: 



: 9 5f ,9 2 



(11) 



In this way, during the instantaneous double support phase, we don't need to apply an 
extra torque to change the angular acceleration of the links. To find the upper body angle 
trajectory, an intermediate angle 9 3 P and its passing time ta are considered as GA variables. 
To determine the angle trajectories of the swing leg, the coordinates of an intermediate point 
P(x p ,Zp) and their passing time t p , are also considered as GA variables. The searching area 
for this point is shown in Figure 2. Based on the number of constraints, the degree of the 
time polynomial for 9 1, 9 2, 9 3, 9 4 and 9 5 are 3, 3, 7,6 and 6, respectively. 





Body 


Lower leg 


Upper leg 


Lower leg + foot 


Mass [kg] 


12 


2.93 


3.89 


4.09 


Inertia [kg m 2 ] 


0.19 


0.014 


0.002 


0.017 


Length [m] 


0.3 


0.2 


0.204 


0.284 


CoM dist.[m] 


0.3 


0.09 


0.1 


0.136 



Table 1. "Bonten-Maru" humanoid robot link parameters. 



5. Results 

5.1 "Bonten-Maru" Humanoid Robot 

In the simulations and experiments, we use the the "Bonten-Maru" humanoid robot (Nasu 
et al. 2002, Takeda et al. 2001). The parameter values are presented in Table 1 and the robot 
is shown in Fig. 3(a). The "Bonten-Maru I" humanoid robot is 1.2 m high and weights 32 kg, 
like an 8 years old child. The "Bonten-Maru I" is a research prototype, and as such has 
undergone some refinement as different research direction are considered. During the 
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design process, some predefined degree of stiffness, accuracy, repeatability, and other 
design factors have been taken into consideration. The link dimensions are determined such 
that to mimic as much as possible the humans. In the "Bonten-Maru" humanoid robot, a DC 
motor actuates each joint. The rotation motion is transmitted by a timing belt and harmonic 
drive reduction system. Under each foot are four force sensors, two at the toe and two across 
the heel. These provide a good indication of both contact with the ground, and the ZMP 
position. The head unit has two CCD cameras (542x492 pixels, Monochrome), which are 
connected to the PC by video capture board. A Celeron based microcomputer (PC/AT 
compatible) is used to control the system. 

The dof are presented in Fig. 3(b). The high number of dof gives the "Bonten-Maru I" 
humanoid robot the possibility to realize complex motions. The hip is a ball-joint, permitting 
three dof; the knee joint one dof; the ankle is a double-axis design, permitting two. The 
shoulder has two dof, the elbow and wrist one dof. The DC servomotors act across the three 
joints of the head, where is mounted the eye system, enabling a total of three dof. The 
distribution of dof is similar with the dof in human limbs. 




Rolling 




(b) 



Fig. 3. "Bonten-Maru" humanoid robot. 



5.2 Simulation and Experimental Results 

Due to difficulties of binary representation when dealing with continuous search space with 
large dimension, real coded GA (Herrera 1998) is used in this study. The decision variables 
are represented by real numbers within their lower and upper limits. We employed a 
standard crossover operator and the non-uniform mutation. In all optimization runs, 
crossover and mutation probabilities were chosen as 0.9 and 0.3, respectively. On all 
optimization runs, the population size was selected as 50 individuals and the optimization 
terminated after 100 generations. The maximum size of the Pareto-optimal set was chosen as 
50 solutions. 
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Fig. 5. Pareto front of nondominated solutions after 100 generations. 

Based on the parameters of the "Bonten-Maru" humanoid robot the step length used in the 
simulations varies from 0.2m to 0.55m. The bounds, within which the solution is sought, 
change according to the step length and step time. In the following, we present the results 
for the step length 0.42m and step time 1.2s. 

Fig. 4 shows the Pareto optimal front for generations 1, 50 and 100. During the first 50 
generations there is a great improvement on the quality and distribution of Pareto optimal 
solutions. From this figure, it can be deduced that the MOEA is equally capable of finding 
the best solution for each objective when two conflicting objectives are considered 
simultaneously. 
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Fig. 6. Different results from Pareto-front solutions. 

Fig. 5 shows the Pareto-optimal trade-off front after 100 generations. We can observe the 
existence of a clear tradeoff between the two objectives. In addition, the obtained reference 
solution set has a good distribution (similar to uniform distribution). One of the interesting 
features of the resulting Pareto front is the almost exponential relation between the MCE 
and MTC cost functions. Results in Box 1 and Box 5 are at the extreme ends of the Pareto 
front. Boxl represents Pareto solutions with high value of MTC function, but low energy 
consumption. Based on the Pareto-optimal solutions, we can choose whether to go for 
minimal CE (Box 1 in Fig. 4) at the expense of a less smoothens in the torque or choose some 
intermediate result. If we are interested for a low consumed energy humanoid robot gait, 
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without neglecting the smoothness in the torque change, the results shown in Boxes 2, 3 are 
the most important. The results in Box 2, show that by a small increase in the energy 
consumption (2.2%), we can decrease the MTC fitness function by around 12.1%. Also, the 
energy can be reduced by 14.5% for a small increase in the MTC cost function (Box 4). 
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Fig. 8. Video capture of robot motion. 
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The torque vector (xi) and the optimal gaits for different results of Pareto front solutions are 
shown in Fig. 6. The robot posture is straighter, similar to humans, for MCE cost function 
(Fig. 6(a)). Torque value is low for MCE gait and the torques change smoothly for MTC gait 
(Fig. 6(b)). The optimal gait generated by Box 3 solutions satisfies both objective functions. 
The energy consumption is increased by 9% but on the other hand the value of MTC cost 
function is decreased by 19.2%. 

The ZMP trajectory is presented in Fig. 7 for humanoid robot gait generated by Box 3 
result. The ZMP is always between the dotted lines, which present the length of the foot. 
At the end of the step, the ZMP is at the position ZMPf, as shown in Fig. 2. At the 
beginning of the step, the ZMP is not exactly at the position ZMPj ump because of the 
foot's mass. It should be noted that the mass of the lower leg is different when it is in 
supporting leg or swing leg. 

In order to investigate how the optimized gaits in simulation will perform in real hardware, 
we transferred the optimal gaits that satisfy both objective functions on the "Bonten-Maru" 
humanoid robot (Fig. 8). The experimental results show that in addition of reduction in 
energy consumption, the humanoid robot gait generated by Box 3 solutions was stable. The 
impact of the foot with the ground was small. 

6. Conclusion 

This paper proposed a new method for humanoid robot gait generation based on 
several objective functions. The proposed method is based on multiobjective 
evolutionary algorithm. In our work, we considered two competing objective 
functions: MCE and MTC. Based on simulation and experimental results, we 
conclude: 

• Multiobjective evolution is efficient because optimal humanoid robot gaits with 
completely different characteristics can be found in one simulation run. 

• The nondominated solutions in the obtained Pareto-optimal set are well distributed 
and have satisfactory diversity characteristics. 

• The optimal gaits generated by simulation gave good performance when they were 
tested in the real hardware of "Bonten-Maru" humanoid robot. 

• The optimal gait reduces the energy consumption and increases the stability during 
the robot motion. 

In the future, it will be interesting to investigate if the robot can learn in real time to switch 
between different gaits based on the environment conditions. In uneven terrains MTC gaits 
will be more 
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An Incremental Fuzzy Algorithm for 
The Balance of Humanoid Robots 
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Mexico, Germany 

1. Introduction 

Humanoid robots base their appearance on the human body (Goddard et al., 1992; Kanehira 
et al., 2002; Konno et al., 2000). Minimalist constructions have at least a torso with a head, 
arms or legs, while more elaborated ones include devices that assemble, for example, human 
face parts, such as eyes, mouth, and nose, or even include materials similar to skin. 
Humanoid robots are systems with a very high complexity, because they aim to look like 
humans and to behave as they do. 

Mechanical control, sensing, and adaptive behaviour are the constituting logical parts 
of the robot that allow it to "behave" like a human being. Normally, researchers 
study these components by modelling only a mechanical part of the humanoid robot. 
For example, artificial intelligence and cognitive science researches consider the 
robot from the waist up, because its visual sensing is located in its head, and its 
behavior with gestures normally uses its face or arms. Some engineers are mostly 
interested in the autonomy of the robot and consider it from the waist down. They 
develop mathematical models that control the balance of the robot and the 
movement of its legs (Miller, 1994; Yamaguchi et al., 1999; Taga et al., 1991), 
allowing the robot to walk, one of the fundamental behaviours that characterizes 
human beings. 

Examples of such mathematical models are static and dynamic walking. The static 
walking model controls the robot to maintain its center of gravity (COG) inside a stable 
support region, while the dynamic walking model maintains the zero moment point 
(ZMP) inside the support region. Kajita et al. (1992) designed and developed an almost 
ideal 2-D model of a biped robot. He supposed, for simplicity, that the robot's COG 
moves horizontally and he developed a control law for initiation, continuation and 
termination of the walking process. Zhen and Shen (1990) proposed a scheme to enable 
robot climbing on inclined surfaces. Force sensors placed in the robot's feet detect 
transitions of the terrain type, and motor movements correspondingly compensate the 
inclination of robot. 

The models mentioned above can be, however, computationally very expensive, and 
prohibitive for its implementation in microcontrollers. Control algorithms for a 
stable walking must be sufficiently robust and smooth, to accomplish a balance 
correction without putting in risk the mechanical stability of the robot. This could be 
resolved by using a controller that modifies its parameters according to a 
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mathematical model, which considers certain performance degree required to offer 
the enough smoothness. 

Fuzzy Logic is especially advantageous for problems which cannot be easily represented 
by a fully mathematical model, because the process is mathematically too complex and 
computationally expensive, or some data is either unavailable or incomplete. The real- 
world language used in Fuzzy Control also enables the incorporation of approximate 
human logic into computers. It allows, for example, partial truths or multi-value truths 
within the model. Using linguistic modeling, as opposed to mathematical modeling, 
greatly simplifies system design and modification. It generally leads to quicker 
development cycles, easy programming, and fairly accurate control. It is important, 
however, to underline that fuzzy logic solutions are usually not aimed at achieving the 
computational precision of traditional techniques, but aims at finding acceptable solutions 
in shorter time. 

The Incremental Fuzzy Control algorithm fulfils the robustness and smoothness 
requirements mentioned above, even its implementation in microcontrollers. Such an 
algorithm is relatively simple and computationally more efficient than other adaptive 
control algorithms, because it consists of only four fuzzy rules. The algorithm demonstrates 
a smooth balance control response between the walking algorithm and the lateral plane 
control: one adaptive gain varies incrementally depending on the required performance 
degree. 

The objective of this chapter is to describe the incremental fuzzy algorithm, used to 
control the balance of lateral plane movements of humanoid robots. This fuzzy control 
algorithm is computationally economic and allows a condensed implementation. The 
algorithm was implemented in a PICF873 microcontroller. We begin on the next section 
with the analysis of the balance problem, and follow later with the description of the 
controller structure. Afterwards, we explain important considerations about the 
modification of its parameters. Finally, we present experimental results of algorithm, used 
on a real humanoid robot, "Dany walker", developed at the Institut fur Informatik of the 
Freie Universitat Berlin. 

2. Robot Structure 

The humanoid robot "Dany Walker" used in our research was built only from the waist 
down. It consists of 10 low-density aluminium links. They are rotational on the pitch axis at 
the hip, knee and ankle. Each link consists of a modular structure. The links form a biped 
robot with 10 degrees of freedom, see Fig. 1. 

The robot structure and its mass distribution affect directly the dynamic of the 
humanoid (Cuevas et al., 2004), therefore, the movement of the Center of Masses 
(COM) has a significant influence on the robot stability. In order to achieve static 
stability, we placed the COM as low as possible. To such purpose, our design uses 
short legs, see Fig. 2 

To compensate the disturbances during walking, our construction enables lateral 
movements of the robot. Thus, it was possible to control the lateral balance of the robot by 
swaying the waist using four motors in the lateral plane: two at the waist and two at the 
ankles, see Fig. 3. 
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Fig. 1 The biped robot "Dany Walker" 




Fig. 2. Dany Walker's COM location. 
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Lateral movement 
Fig. 3. Lateral balance of the motors. 
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3. Balance control criterion 

We used the dynamic walking model to define our balance criterion. It consists of 
maintaining the Zero Moment Point (ZMP) inside the support region (Vukobratovic & Turkic, 
1969; Vukobratovic, 1973). We implemented a feedback-force system to calculate the ZMP, 
and feed it in to the fuzzy PD incremental controller to calculate the ZMP error. Then, the 
controller adjusts the lateral robot's positions to maintain the ZMP point inside of the 
support region. 

To achieve stable dynamic walking, the change between simple supports phase and double 
supports phase should be smooth. In the beginning of the double supports phase, the foot 
returns from the air and impacts against the floor, generating strong forces that affect the 
walking balance (Cuevas et al., 2005). The intensity of these forces is controlled by imposing 
velocity and acceleration conditions on saggital motion trajectories. This is achieved by 
using smooth cubic interpolation to describe the trajectories. In this chapter, we only discuss 
the control of the lateral motion (balance). 

3.1 Zero Moment Point (ZMP) 

The ZMP is the point on the ground where the sum of all momentums is zero. Using this 
principle, the ZMP is computed as follows: 

Z i m i ( z+ s) x i - Z , m < x z - - Z ,- h K CO 

X ZMP 

z.i m .( z+ g) 

_ Z, m > ( z +g)y,- zZ, m i xz >~ Z A ft* ( 2 ) 

y zmp ~ > 

Z/HiO+g) 

where (xzmp, J/zmp/0) are the ZMP coordinates, (Xi,yi,Zj) is the mass center of the link i in the 
coordinate system, m, is the mass of the link i, and g is the gravitational acceleration. !« and 
Ij y are the inertia moment components, 0. and Q are the angular velocity around the axes 

x and y, taken as a point from the mass center of the link i. The force sensor values are 
directly used to calculate the ZM. For the lateral control, it is only necessary to know the 
ZMP value for one axis. Thus, the ZMP calculus is simplified using the formula 



Z/A 



p = M , (3) 

1 ZMP 3 



I/" 



where fi represents the force at the i sensor, and r, represents the distance between the 
coordinate origin and the point where the sensor is located. Figure 4 shows the distribution 
of sensors (marked with tree circles) used for each robot's foot. 
The total ZMP is obtained by the difference between the ZMPs at each foot: 

Total P = P - P (A) 

JU '"' - 1 ZMP x ZMP] x ZMP2 ' V ' 

where Pzmpi is the ZMP for one foot and P ZUP2 is the ZMP for the other. 

Figure 5 shows the ZMP point (black point) for two robot's standing cases, one before to 
give a step (left), and other after give a step (right). The pointed line represents the support 
polygon. 
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Fig. 4. Sensors distribution at the robot's foot. 











) \ 


PzMPl 


\ 










o 


f 


o 

PzM2 










PzMPl 






Total_P ZM p \ 




PzMPl 










^ 

Tota 


r 

L—PzMP 









Fig. 5. The black point represents the Robot ZMP before giving a step (left), and after giving 
a step (right). 



4. The Fuzzy PD incremental algorithm. 

We propose the fuzzy PD incremental control algorithm, as a variant for the fuzzy PD controller 
(Sanchez et al., 1998), to implement the biped balance control. The fuzzy PD incremental control 
algorithm consists of only four rules and has the structure illustrated in Fig. 6. 
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Fig. 6. Fuzzy PD incremental algorithm structure. 

The gains G u , G e and G r are determined by tuning and they correspond respectively to the 
output gains, the error (ZMP error) and error rate (ZMP rate) gains. The value u* is the 
defuzzyficated output or "crisp" output. The value u is 
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u = < 



C u *(u) if\e\<0 (for t = Q, G, BC =0) 



(5) 



G inc +Inc if\e\>0 

Where e is the error (error*G e ), is an error boundary selected by tuning, and G mc is the 
incremental gain obtained adding the increment "Inc". 
Figure 7 shows the flow diagram for the incremental gain of u. 




Fig. 7. Flow diagram for the incremental gain of G u . 

Figure 8 shows the area where the absolute error is evaluated and the controller output is 
incremental (u=Gi„ c +Inc). 



4.1 Fuzzyfication 

As is shown in figure 9, there are two inputs to the controller: error and rate. The error is 
defined as: 



Rate it is defined as it follows: 



error = setpoint - y 



rate = (ce - pe) / sp 



(6) 



(7) 



Where ce is the current error, pe is the previous error and sp is the sampling period. Current 
and previous error, are referred to an error without gain. The fuzzy controller has a single 
incremental output, which is used to control the process 
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Error + - 

Fig. 8. Fuzzy PD incremental absolute error area. 

The input an output membership functions for the fuzzy controller are shown in Fig. 9 and 

Fig. 10, respectively. Fig. 9 shows that each input has two linguistic terms. For the error 

input are: Ge* negative error (en) and Ge* positive error (ep) and for the rate input are: 

Gr*negative rate (rn) and Gr * positive rate (rp), while the output fuzzy terms are shown in Fig. 

10 and they are: Negative output (on), Zero output (oz) and Positive output (op). 

As shown in Fig. 9, the same function is applied for error and rate but with different scaling 

factors: Ge and Gr respectively. 

H and L are two positive constants to be determined. For convenience we will take H=L to 

reduce the number of control parameters to be determined. The membership functions for 

the input variables, error and rate, are defined by: 



Pep 

P,„ 

Pm 



L + (G e * error) 

2L 
L-(G c * error) 

2L 

L + (Gr * rate) 

2L 
L-{Gr*rate) 

21 



(8) 
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Fig. 9. Input membership functions. 
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Fig. 10. Input membership functions. 
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4.2 Fuzzy rules and defuzzyfication. 

There exist four rules to evaluate the fuzzy PD incremental controller (Miller 1994): 

Rl. If error is ep and rate is rp then output is op 

R2. If error is ep and rate is rn then output is oz 

R3. If error is en and rate is rp then output is oz 

R4. If error is en and rafe is rn then output is on 

The determination of these rules can be accomplished easily if the system evolution is 
analyzed in the different operation points. For example, when the error and the rate increase 
(rule 1), it means that the system response decreases and moves away from the setpoint, for this 
reason it is necessary to apply a positive stimulus that allows to increase the system output. The 
figure 11 shows the determination of the different rules based on the system response. 
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Fig. 11. Determination of the different rules based on the system response (see Text). 
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For the fuzzy condensed controller proposed, the input error and rate values ranges can be 
represented in 20 input regions (IC), like is shown in Fig. 12. If the membership functions 
are evaluated, the 4 control rules, the simplification H=L, and the defuzzyfication is applied 
in each one of the 20 inputs combinations, then 9 equations can be obtained (Sanchez et al., 
1998), which can determine the control signal u that should be applied, depending on the 
region it is located. In other words, to implement the fuzzy condensed controller, only will 
be necessary to know the region in which the inputs variables are located and later evaluate 
the corresponding equation for this region. For example, the first equation acts in regions 
IC1, IC2, IC5, IC6. The figure 13 shows the control surface of the fuzzy condensed controller 
considering H=L=1. 
Finally, the defuzzyfication method used is the gravity center, in this case is represented by: 



Mr4(x) + \Mr2(x)+Mr3(x) ) + MrI(x) 



(9) 



5. Real time results 

The first test applied to the balance controller was an x-direction impulse. The robot was 
standing in balance and then a push in the lateral direction was applied to the biped robot. 
Figure 14 shows the evolution of the ZMP value in centimeters during 10 seconds. 
Figure 15 shows the error and rate evolution during the application of the x-direction 
impulse. The incremental PD fuzzy controller's parameters were: Ge=2, Gr=2, and Cu-1. 
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Fig. 13. Control surface of the fuzzy condensed controller considering H=L=1. 
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Fig. 14. ZMP value evolution achieved by the controller, for a x-direction impulse. 
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Fig. 15. Error and rate evolution for a x-direction impulse. 

Finally, a walking test balance was applied to the robot. The robot gave some steps in a flat 
surface, while the balance controller compensated the ZMP value. Figure 16 shows the 
evolution of the ZMP value (in centimeters) achieved by the controller during 15 seconds. 
Figure 17 shows error and rate evolution during the robots walk. The incremental PD fuzzy 
controller's parameters were: Ge=2, Gr=2, and Gu =1. 




Fig. 16. ZMP value evolution achieved by the controller during the robot walk. 
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Fig. 17. Error and rate evolution during the robot walk. 

The motor's position output was computed by the controller during the walk. It is showed 
at Figure 18. This motor's position value is only just for one waist motor. 
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Fig. 18. Motor's position output during the robot walk.. 
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6. Conclusion 

Figure 11 shows the controller's performance and response for an x-direction impulse. The 

response was fast, approximately 2 seconds, until the controller reached a ZMP value near 

to zero. This feature allows the biped robot to gain stability even during walking (figure 13), 

maintaining the ZMP always inside the support polygon. 

Our experiments with the fuzzy PD incremental controller algorithm demonstrated that it is 

computationally economic, all running in a PIC microcontroller, and appropriated for 

balance control. The algorithm was successfully used in real-time with the biped robot 

"Dany walker" (videos available at http://www.inf.fu-berlin.de/~zaldivar). 

The algorithm proposed in this chapter could be also used in other robots structures with a 

different dynamic, and even with other degrees of freedom. It would be only necessary to 

adjust the controller's gain parameters to the particular structure. 

We plan to use the information given by an inclinometer along with the ZMP value. In this 

case, the goal of the bipedal balance robot control will be to achieve an inclination value of 

cero and to maintain the ZMP at the center, or inside of the support polygon. 

The bipedal robot used in this work is part of a project that is being developed at the Freie 

Universitat Berlin. 
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1. Introduction 

In order for humans and robots to cooperate in an effective manner, it must be possible for 
them to communicate. Spoken language is an obvious candidate for providing a means of 
communication. In previous research, we developed an integrated platform that combined 
visual scene interpretation with speech processing to provide input to a language learning 
model. The system was demonstrated to learn a rich set of sentence-meaning mappings that 
could allow it to construct the appropriate meanings for new sentences in a generalization 
task. We subsequently extended the system not only to understand what it hears, but also to 
describe what it sees and to interact with the human user. This is a natural extension of the 
knowledge of sentence-to-meaning mappings that is now applied in the inverse scene-to- 
sentence sense (Dominey & Boucher 2005). The current chapter extends this work to analyse 
how the spoken language can be used by human users to communicate with a Khepera 
navigator, a Lynxmotion 6DOF manipulator arm, and the Kawada Industries HRP-2 
Humanoid, to program the robots' behavior in cooperative tasks, such as working together 
to perform an object transportation task, or to assemble a piece of furniture. In this 
framework, a system for Spoken Language Programming (SLP) is presented. The objectives 
of the system are to 1. Allow the human to impart knowledge of how to accomplish a 
cooperative task to the robot, in the form of a sensory-motor action plan. 2. To allow the user 
to test and modify the learned plans. 3. To do this in a semi-natural and real-time manner 
using spoken language and visual observation/ demonstration. 4. When possible, to exploit 
knowledge from studies of cognitive development in making implementation choices. With 
respect to cognitive development, in addition to the construction grammar model, we also 
exploit the concept of "shared intentions" from developmental cognition as goal-directed 
action plans that will be shared by the human and robot during cooperative activities. 
Results from several experiments with the SLP system employed on the different platforms 
are presented. The SLP is evaluated in terms of the changes in efficiency as revealed by task 
completion time and number of command operations required to accomplish the tasks. 
Finally, in addition to language, we investingate how vision can be used by the robot as well 
to observe human activity in order to able to take part in the observed activities. At the 
interface of cognitive development and robotics, the results are interesting in that they (1) 
provide concrete demonstration of how cognitive science can contribute to human-robot 
interaction fidelity, and (2) they suggest how robots might be used to experiment with 
theories on the implementation of cognition in the developing human. 
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2. Language and Meaning 

Crangle and Suppes (1994) stated: "(1) the user should not have to become a programmer, or 
rely on a programmer, to alter the robot's behavior, and (2) the user should not have to learn 
specialized technical vocabularies to request action from a robot." Spoken language 
provides a very rich and direct means of communication between cooperating humans 
(Pickering & Garrod 2004). Language essentially provides a vector for the transmission of 
meaning between agents, and should thus be well adapted for allowing humans to transmit 
meaning to robots. This raises technical issues of how to extract meaning from language. 
Construction grammar (CxG) provides a linguistic formalism for achieving the required link 
from language to meaning (Goldberg 2003). Indeed, grammatical constructions define the 
direct mapping from sentences to meaning. Meaning of a sentence such as (1) is represented 
in a predicate-argument (PA) structure as in (2), based on generalized abstract structures as 
in (3). The power of these constructions is that they employ abstract argument "variables" 
that can take an open set of values. 

(1) John put the ball on the table. 

(2) Transportflohn, Ball, Table) 

(3) Event(Agent, Object, Recipient) 

We previously developed a system that generates PA representations (i.e. meanings) from 
video event sequences. When humans performed events and described what they were 
doing, the resulting <sentence, meaning> input pairs allowed a separate learning system to 
acquire a set of grammatical constructions defining the sentences. The resulting system 
could describe new events and answer questions with the resulting set of learned 
grammatical constructions (Dominey & Boucher 2005). 

PA representations can be applied to commanding actions as well as describing them. 
Hence the CxG framework for mapping between sentences and their PA meaning can be 
applied to action commands as well. In either case, the richness of the language employed 
will be constrained by the richness of the perceptual and action PA representations of the 
target robot system. In the current research we examine how simple commands and 
grammatical constructions can be used for action command in HRI. Part of the challenge is 
to define an intermediate layer of language-commanded robot actions that are well adapted 
to a class of HRI cooperation tasks. 

This is similar to the language-based task analysis in Lauria et al. (2002). An essential part of 
the analysis we perform concerns examining a given task scenario and determining the set 
of action/ command primitives that satisfy two requirements. 1. They should allow a logical 
decomposition of the set of tasks into units that are neither too small (i.e. move a single 
joint) nor too large (perform the whole task). 2. They should be of general utility so that 
different tasks can be performed with the same set of primitives. 

3. Spoken Language Programming 

For some tasks (e.g. navigating with a map) the sequence of human and robot actions 
required to achieve the task can be easily determined before beginning execution. Other 
tasks may require active exploration of the task space during execution in order to find a 
solution. In the first case, the user can tell the robot what to do before beginning 
execution, while in the second, instruction will take place during the actual execution of 
the task. 
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In this context, (Lauria et al. 2002) asked naive subjects to provide verbal instructions to a 
robot in a miniature-town navigation task. Based on an analysis of the resulting speech 
corpora, they identified a set of verbal action chunks that could map onto robot control 
primitives. More recently, they demonstrated the effectiveness of navigation instructions 
translated into these primitive procedures for actual robot navigation (Kyriacou et al. 2005). 
This research indicates the importance of implementing the mapping between language and 
behavioural primitives for high-level natural language instruction or programming. The 
current study extends these results by introducing a conditional (e.g. if-then) component to 
the programming. 
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Fig. 1. Khepera robot and object transportation scenario. A. Physical set up. B. Labeled 
schematic representation. 

Figure 1 illustrates a given scenario for HRI. In Figure 1A we see the Khepera robot on a 
table top, and in Figure IB a schematic of the robot (represented as the gray disc) in a two 
arm maze. Consider a task in which Userl sends the Khepera robot to User2 who gives it an 
object to transport back to Userl. Getting to User2 requires a conditional choice between two 
paths based on the location of an obstacle that is not known in advance, at the locations 
indicated by the dotted lines. Once the robot determines the location of the obstacle, it 
should choose the path that is not blocked, and make its way to the end of the arm. There it 
should turn around, and wait for User2, to place an object on its carrying surface. When 
User2 has performed this, Userl will say "continue" and the robot should then return to the 
beginning of the maze arm where Userl will take the transported block, and the process 
then continues. 

On-Line Commanding: The simplest solution for controlling the robot in this task, which 
involves no learning, is for Userl simply to tell the robot what to do, step by step. 
Depending on whether the obstacle is at the left or right arm position, the human Userl 
would decide whether to tell the robot to take the right or left pathway. Likewise, once at 
the end point, Userl would wait until the User2 placed the object before commanding the 
robot to turn around and come back. 

Programming a Pre-Specified Problem: Again, in many cases, the problem is known to the 
user in advance (possibly because the user has now "walked" the robot through the 
problem as just described) and can be specified prior to execution. 

The objective of SLP is to provide a framework in which non-specialist humans can 
explain such a task to a robot, using simple spoken language. While the task 
described above remains relatively simple, explaining it to the robot already levies 
several interesting requirements on the system, and meeting these requirements will 
provide a fairly general and robust capability for SLP. In particular, this task requires 
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the following: (1) The user should be able to provide a sequence of primitive 
commands that will be executed in a particular order. (2) The user should be able to 
specify conditional execution, based on the values of robot sensors. (3) The user 
should be able to tell the robot that at a certain point it should wait for some sensory 
event to occur before proceeding with its sequence execution., as demonstrated 
below. 

In addition, for tasks that become increasingly long and complex, the user may make 
mistakes in his/her initial specification of the task, so the SLP system should allow the user 
to test, "debug" and modify programs. That is, once the user has specified the program, he 
should be able to execute it and - if there is a problem - modify it in a simple and 
straightforward manner. 

On-line Commanding with Repetitive Sub-Tasks: On-line commanding allows the user to 
be responsive to new situations, and to learn him/ herself by taking the robot through a 
given task or tasks. On the other hand, for tasks that are well defined, the user can program 
the robot as defined above. In between these two conditions there may arise situations in 
which during the course of solving a cooperative problem with the robot, the user comes to 
see that despite the "open endedness" of a given problem set, there may repetitive subtasks 
that occur in a larger context. In this type of situation, the human user may want to teach the 
robot about the repetitive part so this can be executed as an autonomous subroutine or 
"macro" while the user still remains in the execution loop for the components that require 
his/her decision. 




Fig. 2. Human-Robot cooperation in a furniture construction task. A. Robot takes a table leg 
from User2. B. Robot hands the leg to Userl. C. Robot holds the table steady while Userl 
attaches the leg. 



Figure 2 illustrates such an HRI scenario that involves two humans and the HRP-2 
cooperating in the construction of a small table. The construction task will involve fixing the 
legs to the surface of the table with wood screws. Userl on the left interacts with the robot 
and with User2 on the right via spoken language. 

Userl will command the robot to prepare to receive one of the table legs that will be passed 
to it by User2. The robot waits until it receives a "continue" signal from Userl, and will then 
pass the leg to Userl who will take the leg, and then ask the robot to hold the table top 
steady allowing Userl to attach the leg to the table. Userl then tells the robot to release the 
table. At this point, the first leg has been fixed to the table, and the "get the leg and attach it" 
sequence can be repeated. This task thus has a repeating subsequence that can be applied to 
each of the four legs of the table. Experiments below address this task. These experiments 
identify the utility of a more complex command structure based on grammatical 
constructions. 
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Fig. 3. Cooperative manipulation task scenario. User asks for screws from locations 
identified by color. While robot holds the plastic "table" (as illustrated) the user can insert 
the screws into the table (see inset). 

Influence of Robot Command Structure on Language: Part of the central principal of 
construction grammar (and the related domain of cognitive linguistics) is that human 
language is made up of a structured inventory of grammatical constructions (Goldberg 
2003). These constructions reflect the meaning of prototypical events that are basic to 
human experience. Thus for example the "ditransitive" construction involves one agent 
transferring an object to another, as in "John gave Mary the ball." From a robotics 
perspective, the action predicate-argument Move(Object, Location) can thus be considered 
as a template for a motor program that allows the use of constructions of the form "Give 
the X to Y". This robot PA involves localizing X and Y, and then grasping X and 
transporting the effector to Y to transfer X to Y. To the extent that such an action PA 
(APA) is built into the repertoire of the robot (or to the extent that it can be built up from 
primitives like localize(X), transport-to(X) etc.) rich grammatical constructions can be 
used to generate and interpret sentences like "give me the red block," or "put the blue 
block next to the red block" for robot control. 

Figure 3 illustrates a simplified version of the table construction scenario that uses the 
Lynx6 arm for cooperative manipulation in a construction task. Experiments below 
examine cooperative construction with the Lynx6 based on lessons learned from the 
HRP-2 experiments, including more complex commands based on grammatical 
constructions. 

4. Implementation of Spoken Language Programming 

Because of the potential influence that the robot itself will have on spoken language 
interaction, we chose to develop a core SLP system, and then to adapt the system to three 
different robot platforms the K-Team Khepera, the LynxMotion Lynx 6 DOF manipulator 
arm, and the Kawada Industries HRP-2 Humanoid. This allows us to explore the navigation 
vs. manipulation dimension, and within the manipulation dimension the Lynx allows for 
rapid prototyping while the HRP-2 allows for a much more robust and human-scale human- 
robot interaction. 

Spoken Language Programming Core: The core SLP system is presented in Figures 4 and 5, 
and Table 1. The core system provides the command, conditional control, and 
programming/ teaching capabilities based on spoken language. 
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Fig. 4. Spoken Language Programming Architecture. The fundamental unit of meaning is 
the predicate-argument (PA) representation. APA - Action Predicate-Argument 
representation. PPA - Perceptual Predicate-Argument representation. Spoken language 
commands are interpreted as individual words or grammatical constructions, and the 
command and possible arguments are extracted in Sentence to PA Meaning. The robot level 
commands (CMD) are extracted by the PA manager. World Model includes specification of 
object locations, and related state information. Robot commands are then issued to the 
Controller to effect the commands. Conditional commands including IF CONDITION .. 
OTHERWISE, and WAIT are implemented as part of the Conditional Execution. Learning- 
related commands LEARN, OK/SAVE-MACRO, RUN MACRO are handled by the 
Sequence Encoding and Retrieval function. Event perception and description, provided by 
PA Meaning to Sentence are implemented in Dominey & Boucher (2005). 

Dialog management and spoken language processing (voice recognition, and synthesis) is 
provided by an "off the shelf" public domain product, the CSLU Rapid Appication 
Development (RAD) Toolkit (http://cslu.cse.ogi.edu/toolkit/). RAD provides a state-based 
dialog system capability, in which the passage from one state to another occurs as a function 
of recognition of spoken words or phrases; or evaluation of Boolean expressions. Figure 5 
illustrates the major dialog states. 
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Fig. 5. Major Dialog States 

In Figure 5, when the system starts in the "Mode Select" state it asks the user "Shall we learn 
by planning or by demonstration ?" 

Learning by Demonstration: If the user replies "demonstration" (note: In all dialogs, 
human responses will be indicated by italics) then the system transitions to the 
"Direct" state, and interrogates the subject with "Now what?" The user can reply with 
one of the robot-specific motor commands, which the robot will execute and then 
again prompt "Now what?" In the course of this direct interaction, if the user 
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determines that there is a repetitive component in the task, as illustrated in the 
discussion of Figure 2, above, the user can invoke the "Learn" command. This 
indicates to the system to begin to store the subsequent commands. When the 
repetitive subsequence has been performed, the user then issues the command "OK" 
and the subsequence is stored, and can then be executed in-line with the command 
"Macro". This learning, in which the user "demonstrates" the performance by 
directing the robot through the execution is illustrated in Exps 3-5. 

Learning by Planning: In Figure 5, when the system starts in the "Mode Select" state it asks 
the user "Shall we learn by planning or by demonstration?" If the user replies "planning" then 
the system transitions to the "Plan" state, and interrogates the subject with "Shall we make a 
new plan, run the old plan, edit it or quit ?" 

If the user replies "New plan" then the system continuously interrogates the user with "now 
what?," with the user responding each time with the next motor command or conditional 
command until the user replies "stop." In this manner, command by command without 
execution, the user specifies the planned program using spoken language. After "stop", the 
plan is stored, and the system returns to the Plan state. Here, the user can now run the plan 
to see how it works. If the user detects that there is a problem with one of the elements in 
the stored plan, at the "Plan" state, he has the option of saying that he wants to "edit" the 
plan. 

Editing Learned Programs: When the system enters the Edit state, it displays the plan 
graphically as a list of commands and corresponding conditions, and then ask the user what 
he wants to modify. The user can then respond with a two word phrase in which the first 
word is "action" or "condition", and the second word is the number of the action or 
condition in question. The system then asks the user for the new value for that component, 
and then again asks what the user wants to modify. When the editing is finished, the system 
then returns to the Plan state where the user can either execute the modified plan, or do 
something else. 

Conditional Execution: In addition to the learning-related commands, we have also 
identified the requirement for commands related to conditional behavior execution and 
control. These commands are specified in Table 1. Perhaps the most obvious of these the "if 
-then-else" construction. 



Conditional Commands 


Effects/ Correspondence 


If condition 


Begin a conditional segment 


Otherwise 


End a conditional segment 


Wait condition 


Interrupt execution until condition is met 


Continue 


User command to proceed when waiting This is one of 
the conditions for wait. 


Learning-Related Commands 




Learn 


Begin macro learning 


Ok/ Stop 


End macro learning 


Macro 


Run stored macro 



Table 1. Conditional and Learning-Related Commands. 



Indeed this is the first concrete example of the use of a multi-word construction. When the 
"if condition" construciton is issued, the user specifies the "if" and the corresponding 
condition in the same utternence (e.g. "if left clear"), and the "if" and the "left clear" 



192 Humanoid Robots, New Developments 

condition are paired together in the stored sequence plan. The user then continues to 
specify the succession of commands that should logically follow in case the "if" condition 
is satisfied. When the user terminates that logical section, he indicates this with the 
conditional command "otherwise". The subsequent succession of commands corresponds 
to those that should be executed in case the condition of the "if" fails. During sequence 
execution in the Run state, when the system encounters an "if", it tests the corresponding 
condition. When the condition is true, the system executes up to the "otherwise" 
statement. When false, it skips to the otherwise, and then resumes execution of the 
subsequent commands. 

The behavioral scenarios above also identified the need for a conditional wait, in which 
the execution of a stored sequence waits for a sensory condition to become true. Like 
the "if", the "wait condition" construction uses the identifier "wait" followed by a 
condition, which is paired with the wait command. Then, during execution of the plan 
in the Run state, when the system encounters a "wait" action, it tests the corresponding 
condition, and waits until that condition is true before proceeding. The "continue" 
condition (indicated by the user saying "continue") is the single robot-independent 
condition. 

Robot Specific Command: Given the robot-independent component of the SLP architecture, 
we now explain how this is adapted to the specific robots. The final behavioral result of a 
robot-specific action command that is issued either directly or as part of a learned plan is the 
execution of the corresponding action on the robot. We new specify the specific commands 
for each of the three robot platforms we used. 

K-Team Khepera: The Khepera (K-Team) is a small two-wheeled robot equipped with 
proximity sensors that has been extensively explored in sensory-motor robotics. Based on 
the requirements for the cooperative task described above, a set of primitive actions, and 
senor states was identified for the Khepera. These robot-specific sensory-motor functions are 
identified in Table 2. 

The Khepera is connected to the SLP system PC via the RS232 serial port. Commands are 
issued to the Khepera via this connection, and the values of the forward, left and right 
position sensors are read from this connection as well. 



Motor Commands 


Resulting Actions 


Explore 


Move forward until obstacle is encountered 


Left 


Turn 90° left 


Right 


Turn 90° right 


Specifiable Conditions 


Corresponding check 


Left clear 


Check left proximity sensor 


Right clear 


Check right proximity sensor 


Obstacle removed 


Check forward proximity sensor 



Table 2. Khepera Action Commands and Sensory Conditions. 

Kawada Industries HRP-2 Humanoid: Based on the preliminary analysis of the table- 
building scenario above, a set of primitive actions was identified for the Kawada 
Industries HRP-2 humanoid robot under the control of the OpenHRP controller (Dominey 
et al. 2007). The HRP-2 has 30 controlled degrees of freedom, 8 of which are used in this 
study. Each of the functions in Table 3 corresponds to a particular posture that is specified 
as the angles for a subset of the 30 DOFs. These actions have been implemented by hand- 
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written python scripts that specify final, hardcoded, joint angles and motion durations for 
the given postures. The python script execution is triggered remotely by the CSLU toolkit, 
and communicates directly with the low-level OpenHRP framework. The motion is 
achieved by linearly interpolating joint angles between the starting and final 
configurations, for each specific action. We have chosen these simple actions in order to 
demonstrate the feasibility of the overall approach in the table-building scenario, and 
more complex functions are currently under development. 



Motor Command 


Resulting Actions 


Prepare 


Move both arms to neutral position, rotate chest to center, 
elevate left arm, avoiding contact with the work surface (5 
DOF) 


Left open 


Open left hand (1 DOF) 


Left close 


Close left hand (1 DOF) 


Give it to me 


Rotate hip to pass the object in left hand to user on the right 
(1 DOF) 


Hold 


Center hip, raise right arm preparing to hold table top (5 
DOF) 


Right open 


Open right hand (1 DOF) 


Right close 


Close right hand (1 DOF) 



Table 3. HRP-2 Action Commands. 

Lynx 6 DOF Manipulator: We also performed SLP experiments with a 6DOF Lynx 6 robot 
manipulator arm. The robot controller is connected to the Pentium PC running RAD via the 
RS232 serial port, and provides access to the 6 joint angles which can be controlled 
individually and in parallel with a time-to-completion parameter. Based on task 
requirements for object manipulation and the miniature task building tasks, we pre- 
localized 8 locations in the workspace along a fixed radius that correspond to the color and 
number locations indicated in Fig. 3. We then manually determined the joint angles for 
taking the manipulator to these regions, above the workspace, and for lowering the 
manipulator to be able to grasp objects at these locations. These postures were then used to 
create the action commands in Table 4. 



Action Command 


Resulting Actions 


Put the Color screw at 
location number. 


Move to location marked by color, pick up the screw, place 
it at location marked by number. Return to neutral 
position. 


Give me the color screw. 


Move to location marked by color, pick up the screw, place 
it at location three. Return to neutral position. 


Open gripper 


Open gripper 


Close gripper 


Close gripper 


Hold the table 


Move gripper to location 3, preparing to hold the table 



Table 4. Lynx 6 Action Commands 

Note that in Table 4 that the "put" and "give me" commands are multiword constructions 
that take arguments. These constructions correspond to the ditransitive construction from 
section II. B. 
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5. Experimental Results Part 1 

We now provide results from a series of experiments that were performed based on the 
scenarios described in Section 3. These experiments were performed with human users who 
were familiar with the system. 

5.1 Programming Conditional Navigation with Khepera 

The current experiment tests whether the user can use spoken language to explain a 
complex conditional task to the robot, based on the scenario described in section 3 and Figl 
in which Userl sends the Khepera robot to User2 who gives it an object to transport back to 
Userl. Getting to User2 requires a conditional choice between two paths based on the 
location of an obstacle that is not known in advance. 

In the following dialog, robot language is specified in italics, and human responses in 
normal text. Technical and explanatory annotation is in <>'s. 

Dialog for Cooperative Navigation and Transportation: 

Shall we learn by planning or by demonstration ? Planning. Shall we make a new plan, run the old 

plan, edit it or quit ? New plan. Now what? <user begins to specify the plan> 

Explore. Now what? If left clear. <the user specifies that if the left is clear, the following 

should be performed > 

Now what? Left. Now what? Explore Now what? Right Now what? Explore. Now what? Left. 

Now what? Left. <the user has now specified the plan segment that takes the robot to 

location L in Fig 1, and turns it around, ready for the return trip> 

Now what? Wait until I say continue. <user specifies that before continuing execution of the 

return trip the robot should wait for a verbal "continue" signal. During this period the 

second human team member places the Lego block on the robot. Speech pattern match on 

"wait *any continue" where *any matches anything. > 

<The following plan segment takes the robot back to the starting location, where the 

principal team member can decide where to place the block. > 

Now what? Explore. Now what? Left. Now what? Explore. Now what? Right. Now what? 

Explore. Now what? Left? Now what? Left. Now what? 

< The planning for the conditional execution of the path to the left is finished. This plan 

segment now specifies the "else' - what should happen if the left arm is blocked. It is similar 

to what happens for the left arm case> 

Otherwise. Now what? Right. Now what? Explore. Now what? Left. Now what? Explore. Now 

what? Left. Now what? Left. Now what? Wait until I say continue. <Robot now at the R, ready 

to come back after User2 places the block. > Now what? Explore. Now what? Left. <This is an 

error that will turn the robot into the wall> Now what? Explore. Now what? Right. Now what? 

Explore. Now what? Left Now what? Left. Now what? Stop. <Stop indicates that the program 

should be stored, and then executed> 

When the stored plan executed we observed that when the obstacle was placed to the right, 
the robot correctly detected that the left was clear, and proceeded along the left arm to the 
location marked L. It then announced that a conditional wait was encountered, and that it 
was waiting for the spoken command "continue". This allowed Userl to verily that User2 
had placed a Lego on the robot, so that he could then say "continue" and the robot 
continued back to the starting location, where Userl could recover the block. We then 
placed the obstacle at the left location. The conditional execution of the "if" determined that 
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the test condition failed, and execution skipped to the "otherwise" section of the plan. In 
execution of this plan segment, an error in the plan was detected. After the robot started to 
return to Userl (after the "continue" command had been issued) the plan incorrectly 
specified that the robot turn left (into the wall) rather than right (towards the center of the 
maze). 

That is, during the specification of a plan of this length, it is possible that the user makes an 
error in a command, particularly in left vs. right distinctions when the robot is moving 
towards the human, in which the human and robot reference frames are reversed. The 
editing capability described above thus allowed the user to listen to the program and see it 
on the screen and then modify the errors by indicating the number of the action in the 
program, and what it should be changed to. 

5.2 Programming Cooperative Manipulation with HRP-2 

We now move from the navigation domain to object manipulation. The current experiment 
exploits the bimanual manipulation capabilities of the HRP-2 humanoid for a human-robot 
cooperation task that involves constructing a piece of furniture (Dominey et al. 2007). 
In the following paragraphs we provide the literal transcription of the human-robot dialog, 
annotated with explanation about the corresponding human and robot behavior and key 
system functions in <> brackets. Again, robot language will be indicated in italics, and 
human language in plain text. 

Phase 1. Spoken language commanding. 

Users 1 and 2 (Fig 3) begin their first interaction with the robot. Userl controls the task via 
spoken language. Because the task is new to him, he proceeds for the moment only by 
commanding the robot, with no programming. 

Dialog : <Robot initiates the interaction with the prompt> I am ready. <the human responsd 
with the command> Prepare. You said prepare? Yes. <Because of possible errors in speech 
recognition, we choose to have an explicit validation on each command as illustrated here>. 
Preparing to grasp. <Robot raises left hand > I am ready. Left open. Opening left hand. <Robot 
opens left hand> I am ready. <User2 places the table leg in the robot's left hand (See Fig 2)> 
Left close. Closing left hand. <Robot closes left hand to grasp the table leg> I am ready. Give it 
to me. Giving you the object. <Robot turns to the right, holding the table leg out to Userl> I 
am ready. Left open. Yes. Opening left hand. <As the Robot opens the hand, Userl reaches for 
and grasps the table leg> I am ready. Hold. <the first of several primitives required for the 
robot to hold the table while Userl attaches the leg.> You said give it to me? No. <This is an 
example of a speech recognition error. > I am ready. Hold. Yes. Getting ready to hold. <The 
robot orients to the work surface and raises its right hand. I am ready. Right open. Opening 
right hand. <With the robot's right hand open, Userl places the table top within the robot's 
grasp>. I am ready. Right close. Closing right hand. I am ready. Wait. Waiting for your signal 
<During this wait period, Userl attaches the leg to the table, and then gives the "continue" 
command when ready. > Continue. I am ready. Right open. Opening right hand. <Userl takes 
the table from the robot and prepares for the next leg> I am ready. 

Phase 2: Programming the Robot 

At this point, Userl has used spoken language tele-operation to enlist the robot's assistance 
in attaching the first leg to the table. Given the knowledge of the task Userl has gained, he is 
now ready to "program" the robot for the repetitive component of the task by using the 
"learn" command. 
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Dialog : Learn. You said learn? Yes. I'll start learning now. I am ready. Prepare. You said ■prepare? 
Yes. Preparing to grasp.... (same sequence as for the first leg) ... Right open. You said right open? 
Yes. Opening right hand. <Userl takes the table>. I am ready. OK. You said OK? Yes. OK we will 
store this plan. I am ready. 

Execution of Stored Program Macro 

Now the stored program can be used via the "macro" command. 

Dialog: Macro. You said macro? Yes. Running the macro. Preparing to grasp. Waiting for your 
signal. Continue. Closing left hand. <Robot takes leg from User2> Giving you the object. 
Opening left hand. <Userl takes leg>. Getting ready to hold. Waiting for your signal. <Userl 
place table in robot's right hand> Continue. Closing right hand. Waiting for your signal <Userl 
attaches the table leg>. Continue. Opening right hand. In line macro finished. I am ready. <Third 
leg is now attached. Userl takes the table. Same procedure with macro for attaching fourth 
leg> 

Performance Analysis: During the assembly of the first two legs, Userl issued 10 distinct 
commands for each of the two legs. Installing legs 1 and 2 took 3min 25sec and 3min 24sec. 
Once the program was learned, for legs 3 and 4, a single command initiated the program, 
and the user was only required to issue 3 "continue" commands in order to indicate to the 
robot that he was ready to proceed. Execution time was reduced to 2:11 and 2:33 
respectively. We performed a statistical analysis of the variance (ANOVA) in indivisual 
action completion times examining the effects of Repetition (i.e. first and second leg in either 
the CMD or PRG mode), and Programming condition (i.e. CMD vs PRG). Only the 
Programming condition had a significant effect on the completion times (ANOVA, 
Programming Effect: F(l,6) = 109, p < 0.0001). 

We performed a second experiment in which the same primitives were used, with 
absolutely no change to the software, to disassemble the table. The use of the 
programming capability for the third and fourth leg (executed in 2:51 and 2:51 
respectively) yielded significant reductions in execution time as compared with the first 
two legs (executed in 3:57 and 4:11 respectively). To compare performance in the two 
experiments we performed a 3 way ANOVA with the factors Experiment (assembly, 
disassembly), Programming (with vs. without, i.e. PRG vs CMD), and Repetition (First vs. 
second repetition in each condition). For the completion times were elevated for the CMD 
vs PRG conditions, i.e. action execution was slower when programming was not used. 
The ANOVA reveled that only the Programming effect was significant (F(l,6) = 277, p < 
0.0001). 

5.3 Grammatical Constructions for Manipulation with Lynx6 Arm 

The programming by command sequencing illustrated in the previous experiment clearly leads 
to improvements in performance and facilitation of the task for the human. Further progress can 
be made by the implementation of higher level action functions that respect the natural 
interaction categories that can arise in these cooperative manipulation tasks. In particular, the 
user who is building the table repeatedly interacts with the robot so that the robot will pass him 
the next leg of the table. The action semantics can be captured in the construction "Give me X", or 
more generally "Move the object at location X to location Y" where X specifies a target object 
location and Y a destination location where that object is to be placed. 

We hypothesized that the availability of this kind of construction would simplify the task 
for the user, and also speed up the execution process. We test this hypothesis with the Lynx 
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6 robot, as a prototype platform before subsequent implementation on the HRP-2. The table 
construction task with the Lynx employed a plastic table surface and four 3cm threaded 
bolts with nuts (see insert Fig.3). These served as legs, and the robot was used to fetch the 
legs and hold the table while the user fixed the legs to the table, as in the previous 
experiment. As illustrated in Fig. 3 we designated 4 areas indicated by color for the four legs 
to be placed initially, and four numbered locations where these legs could be placed using a 
construction of the form: Put the X leg at location Y, where X was one of the four color 
designations and Y one of the four numbered designations (see Fig 3). We also implemented 
a construction of the form : Give me the X leg, which commanded the robot to fetch the leg 
at location color, and place it at location 3 which was just in front of the user. 

Dialog : 

Command an action. Give me the red screw. You want the red screw? Yes. Giving the red screw 

<robot takes screw at red location> to you <and passes it to the user>. Command an action. 

Start learning. <this initiates memory of the subsequent command sequence> Command an 

action. Hold. Preparing to hold. Command an action. Close gripper. Closing gripper. Command an 

action. Wait. Waiting for your signal. <User insertes and tightens screw while robot holds the 

grey plastic "table". See Fig 3> Continue. Command an action. Open. Opening gripper. 

Command an action. Save the macro. You said save the macro? Yes. 

Command an action. Give me the blue screw. You want the blue screw? Yes. Giving the blue 

screw... to you. Command an action. Run the macro. You said run the macro? Yes. Preparing to 

hold. Closing gripper. Waiting for your signal. <User attaches screw while robot holds the table 

steady>. Continue. Opening gripper. 

Command an action. Give me the green screw. You want the green screw? Yes. Giving the green 

screw... to you. Command an action. Run the macro. You said run the macro? Yes. Preparing to 

hold. Closing gripper. Waiting for your signal. Continue. Opening gripper. 

Command an action. Give me the yellow screw. You want the yellow screw? Yes. Giving the 

yellow screw... to you. Command an action. Run the macro. You said run the macro? Yes. 

Preparing to hold. Closing gripper. Waiting for your signal. Continue. Opening gripper. 

The total execution time for the assembly was less than five minutes. With the macro use, 

the user required five spoken commands per leg, and three of these were confirmations of 

commands and the continue signal. The two main commands are "Give me.." and "run the 

macro." With respect to the HRP-2, the availability of the "Give me the X screw" PA 

command eliminated the need to sequence a number of lower level postural commands into 

the macro, which instead was limited to implementing the "hold the table" function. 

6. Programming Shared Behavior by Observation and Imitation 

If spoken language is one of the richest ways to transmit knowledge about what to do, 
demonstration and imitation of observed behavior are clearly another. One of the current 
open challenges in cognitive computational neuroscience is to understand the neural basis 
of the human ability to observe and imitate action. The results from such an endeavor can 
then be implemented and tested in robotic systems. Recent results from human and non- 
human primate behavior, neuroanatomy and neurophysiology provide a rich set of 
observations that allow us to constrain the problem of how imitation is achieved. The 
current research identifies and exploits constraints in these three domains in order to 
develop a system for goal directed action perception and imitation. 
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One of the recurrent findings across studies of human imitation is that in the context of goal 
directed action, it is the goal itself that tends to take precedence in defining what is to be 
imitated, rather than the means (Bekkering et al. 2000, Tomasello et al. 2005). Of course in 
some situations it is the details (e.g. kinematics) of the movement itself that are to be 
imitated (see discussion inCarpetner and Call 2007, Cuijpers et al. 2006), but the current 
research focuses on goal based imitation. This body of research helped to formulate 
questions concerning what could be the neurophysiological substrates for goal based 
imitation. In 1992 di Pellegrino (et al.) in the Rizzolatti lab published the first results on 
"mirror" neurons, whose action potentials reflected both the production of specific goal- 
directed action, and the perception of the same action being carried by the experimenter. 
Since then, the premotor and parietal mirror system has been studied in detail in monkey 
(by single unit recording) and in man (by PET and fMRI) reviewed in Rizzolatti & Craighero 
(2004). 

In the context of understanding imitation, the discovery of the mirror system had an 
immense theoretical impact, as it provided justification for a common code for action 
production and perception. In recent years a significant research activity has used 
simulation and robotic platforms to attempt to link imitation behavior to the underlying 
neurophysiology at different levels of detail (see Oztop et al. (2006) for a recent and 
thorough review, edited volume (Nehaniv & Dautenhahn 2007), and a dedicated special 
issue of Neural Networks (Billard & Schaal 2006)). Such research must directly address the 
question of how to determine what to imitate. Carpenter and Call (2007) distinguish three 
aspects of the demonstration to copy: the physical action, the resulting change in physical 
state, and the inferred goal - the internal representation of the desired state. Here we 
concentrate on imitation of the goal, with the advantage of eliminating the difficulties of 
mapping detailed movement trajectories across the actor and imitator (Cuijpers et al. 2006). 
Part of the novelty of the current research is that it will explore imitation in the context of 
cooperative activity in which two agents act in a form of turn-taking sequence, with the 
actions of each one folding into an interleaved and coordinated intentional action plan. With 
respect to constraints derived from behavioral studies, we choose to examine child 
development studies, because such studies provide well-specified protocols that test 
behavior that is both relatively simple, and pertinent. The expectation is that a system that 
can account for this behavior should extend readily to more complex behavior, as 
demonstrated below. 

Looking to the developmental data, Warneken, Chen and Tomasello (2006) engaged 18-24 
month children goal-oriented tasks and social games which required cooperation. In one of 
the social games, the experiment began with a demonstration where one participant sent a 
wooden block sliding down an inclined tube and the other participant caught the block in a 
tin cup that made a rattling sound. This can be considered more generally as a task in which 
one participant moves an object so that the second participant can then in turn manipulate 
the object. This represents a minimal case of a coordinated action sequence. After the 
demonstration, in Trials 1 and 2 the experimenter sent the block down one of the tubes three 
times, and then switched to the other, and the child was required to choose the same tube as 
the partner. In Trials 3 and 4 during the game, the experimenter interrupted the behavior for 
15 seconds and then resumed. 

Behaviorally, children successfully participated in the game in Trials 1 and 2. In the 
interruption Trials 3 and 4 they displayed two particularly interesting types of response that 
were (a) to attempt to perform the role of the experimenter themselves, and/ or (b) to 
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reengage the experimenter with a communicative act. This indicates that the children had a 
clear awareness both of their role and that of the adult in the shared coordinated activity. 
This research thus identifies a set of behavioral objectives for robot behavior in the 
perception and execution of cooperative intentional action. Such behavior could, however, 
be achieved in a number of possible architectures. 
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Fig. 6. Cooperation System. In a shared work-space, human and robot manipulate objects 
(green, yellow, read and blue circles corresponding to dog, horse, pig and duck), placing 
them next to the fixed landmarks (light, turtle, hammer, etc.). Action: Spoken commands 
interpreted as individual words or grammatical constructions, and the command and 
possible arguments are extracted using grammatical constructions in Language Proc. The 
resulting Action(Agent, Object, Recipient) representation is the Current Action. This is 
converted into robot command primitives (Motor Command) and joint angles (Motor 
Control) for the robot. Perception: Vision provides object location input, allowing action to be 
perceived as changes in World State (State Comparator). Resulting Current Action used for 
action description, imitation, and cooperative action sequences. Imitation: The user 
performed action is perceived and encoded in Current Action, which is then used to control 
the robot under the supervision of Executive Control. Cooperative Games. During 
observations, individual actions are perceived, and attributed to the agent or the other 
player (Me or You). The action sequence is stored in the We Intention structure, that can 
then be used to separately represent self vs. other actions. 



7. Implementing Shared Planning 

In a comment on Tomasello et al (2005) on understanding and sharing intention, Dominey 
(2005) analyses how a set of initial capabilities can be used to provide the basis for shared 
intentions. This includes capabilities to : 1. perceive the physical states of objects, 2. perceive 
(and perform) actions that change these states, 3. distinguish between self and other, 4. 
perceive emotional/ evaluation responses in others, and 5. learn sequences of predicate- 
argument representations. 

The goal is to demonstrate how these 5 properties can be implemented within the 
constraints of the neurophysiology data reviewed above in order to provide the basis for 
performing these cooperative tasks. In the current experiments the human and robot 
cooperate by moving physical objects to different positions in a shared work-space as 
illustrated in Figures 6 and 7. The 4 moveable objects are pieces of a wooden puzzle, 
representing a dog, a pig, a duck and a cow. These pieces can be moved by the robot and the 
user in the context of cooperative activity. Each has fixed to it a vertically protruding metal 
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screw, which provides an easy grasping target both for the robot and for humans. In 
addition there are 6 images that are fixed to the table and serve as landmarks for placing the 
moveable objects, and correspond to a light, a turtle, a hammer, a rose, a lock and a lion, as 
partially illustrated in Figures 6 & 7. In the interactions, human and robot are required to 
place objects in zones next to the different landmarks, so that the robot can more easily 
determine where objects are, and where to grasp them. Figure 6 provides an overview of the 
architecture, and Figure 7, which corresponds to Experiment 6 provides an overview of how 
the system operates. 

Representation: The structure of the internal representations is a central factor determining 
how the system will function, and how it will generalize to new conditions. Based on the 
neurophysiology reviewed above, we use a common representation of action for both 
perception and production. Actions are identified by the agent, the object, and the target 
location to move that object to. As illustrated in Figure 6, by taking the short loop from 
vision, via Current Action Representation, to Motor Command, the system is thus 
configured for a form of goal-centered action imitation. This will be expanded upon below. 
A central feature of the system is the World Model that represents the physical state of the 
world, and can be accessed and updated by vision, motor control, and language, similar to 
the Grounded Situation Model of Mavridis & Roy (2006). The World Model encodes the 
physical locations of objects that is updated by vision and proprioception (i.e. robot action 
updates World Model with new object location). Changes in the World Model in terms of an 
object being moved allows the system to detect actions in terms these object movements. 
Actions are represented in terms of the agent, the object and the goal of the action, in the 
form MOVE(object, goal location, agent). These representations can be used for 
commanding action, for describing recognized action, and thus for action imitation and 
narration, as seen below. In order to allow for more elaborate cooperative activity, the 
system must be able to store and retrieve actions in a sequential structure. 




Fig. 7. Cooperative task of Exp 5-6. Robot arm, with 6 landmarks (Light, turtle, hammer, 
rose, lock and lion from top to bottom). Moveable objects include Dog and Horse. In A-D, 
human demonstrates a "horse chase the dog" game, and successively moves the Dog then 
Horse, indicating that in the game, the user then the robot are agents, respectively. After 
demonstration, human and robot "play the game". In each of E - F user moves Dog, and 
robot follows with Horse. In G robot moves horse, then in H robot detects that the user is 
having trouble and so "helps" the user with the final move of the dog. See Exp 5 & 6. 
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Visual perception: Visual perception is a challenging technical problem. To simplify, 
standard lighting conditions and a small set (n = 10) of visual object to recognize are 
employed (4 moveable objects and 6 location landmarks). A VGA webcam is positioned at 
1.25 meters above the robot workspace. Vision processing is provided by the Spikenet 
Vision System (http://www.spikenet-technology.coni/). Three recognition models for each 
object at different orientations (see Fig. 8.1) were built with an offline model builder. During 
real-time vision processing, the models are recognized, and their (x, y) location in camera 
coordinates are provided. Our vision post-processing eliminates spurious detections and 
returns the reliable (x, y) coordinates of each moveable object in a file. The nearest landmark 
is then calculated. 

Motor Control & Visual-Motor Coordination: While visual-motor coordination is not the 
focus of the current work, it was necessary to provide some primitive functions to allow 
goal directed action. All of the robot actions, whether generated in a context of imitation, 
spoken command or cooperative interaction will be of the form move(x to y) where x is a 
member of a set of visually perceivable objects, and y is a member of the set of fixed 
locations on the work plan. 

Robot motor control for transport and object manipulation with a two finger gripper is 
provided by the 6DOF Lynx6 arm (www.lynxmotion.com). The 6 motors of the arm are 
coordinated by a parallel controller connected to a PC computer that provides transmission 
of robot commands over the RS232 serial port. 

Human users (and the robot) are constrained when they move an object, to place it in one of 
the zones designated next to each of the six landmarks (see Fig 3). This way, when the 
nearest landmark for an object has been determined, this is sufficient for the robot to grasp 
that object at the prespecified zone. 




Figure 8. 1. Vision processing. Above: A. - D. Three templates each for the Dog, Duck, Horse 
and Pig objects at three different orientations. Below, encompassing circles indicate template 
recognition for the four different objects near different fixed landmarks, as seen from the 
camera over the robot workspace. II. Dialog flow of Control 

In a calibration phase, a target point is marked next to each of the 6 fixed landmark 
locations, such that they are all on an arc that is equidistant to the center of rotation of the 
robot arm base. For each, the rotation angle of Joint (the rotating shoulder base) necessary 
to align the arm with that point is then determined, along with a common set of joint angles 
for Joints 1-5 that position the gripper to seize any of the objects. Angles for Joint 6 that 
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controls the closing and opening of the gripper to grasp and release an object were then 
identified. Finally a neutral position to which the arm could be returned in between 
movements was defined. The system was thus equipped with a set of primitives that could 
be combined to position the robot at any of the 6 grasping locations, grasp the 
corresponding object, move to a new position, and place the object there. 
Cooperation Control Architecture: The spoken language control architecture illustrated in 
Fig 8.II is implemented with the CSLU Rapid Application Development toolkit 
(http://cslu.cse.ogi.edu/toolkit/). This system provides a state-based dialog management 
system that allows interaction with the robot (via the serial port controller) and with the 
vision processing system (via file i/o). It also provides the spoken language interface that 
allows the user to determine what mode of operation he and the robot will work in, and to 
manage the interaction via spoken words and sentences. 

Figure 8. II illustrates the flow of control of the interaction management. In the Start state the 
system first visually observes where all of the objects are currently located. From the start 
state, the system allows the user to specify if he wants to ask the robot to perform actions 
(Act), to imitate the user, or to play (Imitate/ Play). In the Act state, the user can specify 
actions of the form "Put the dog next to the rose" and a grammatical construction template 
is used to extract the action that the robot then performs. In the Imitate state, the robot first 
verifies the current state (Update World) and then invites the user to demonstrate an action 
(Invite Action). The user shows the robot one action. The robot re-observes the world and 
detects the action based on changes detected (Detect Action). This action is then saved and 
transmitted (via Play the Plan with Robot as Agent) to execution (Execute action). A 
predicate(argument) representation of the form Move(object, landmark) is used both for 
action observation and execution. 

Imitation is thus a minimal case of Playing in which the "game" is a single action executed 
by the robot. In the more general case, the user can demonstrate multiple successive actions, 
and indicate the agent (by saying "You/ 1 do this") for each action. The resulting intentional 
plan specifies what is to be done by whom. When the user specifies that the plan is finished, 
the system moves to the Save Plan, and then to the Play Plan states. For each action, the 
system recalls whether it is to be executed by the robot or the user. Robot execution takes the 
standard Execute Action pathway. User execution performs a check (based on user 
response) concerning whether the action was correctly performed or not. If the user action is 
not performed, then the robot communicates with the user, and performs the action itself. 
Thus, "helping" was implemented by combining an evaluation of the user action, with the 
existing capability to perform a stored action representation. 

8. Experimental Results Part 2 

For each of the 6 following experiments, equivalent variants were repeated at least ten times 
to demonstrate the generalized capability and robustness of the system. In less than 5 
percent of the trials, errors of two types were observed to occur. Speech errors resulted from 
a failure in the voice recognition, and were recovered from by the command validation 
check (Robot: "Did you say ...?"). Visual image recognition errors occurred when the objects 
were rotated beyond 20° from their upright position. These errors were identified when the 
user detected that an object that should be seen was not reported as visible by the system, 
and were corrected by the user re-placing the object and asking the system to "look again". 
At the beginning of each trial the system first queries the vision system, and updates the 
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World Model with the position of all visible objects. It then informs the user of the locations 
of the different objects, for example "The dog is next to the lock, the horse is next to the 
lion." It then asks the user "Do you want me to act, imitate, play or look again?", and the 
user responds with one of the action-related options, or with "look again if the scene is not 
described correctly. 

Validation of Sensorimotor Control: In this experiment, the user says that he wants the 
"Act" state (Fig 8. II), and then uses spoken commands such as "Put the horse next to the 
hammer". Recall that the horse is among the moveable objects, and hammer is among the 
fixed landmarks. The robot requests confirmation and then extracts the predicate-argument 
representation - Move(X to Y) - of the sentence based on grammatical construction templates. 
In the Execute Action state, the action Move(X to Y) is decomposed into two components of 
Get(X), and Place-At(Y). Get(X) queries the World Model in order to localize X with respect 
to the different landmarks, and then performs a grasp at the corresponding landmark target 
location. Likewise, Place-At(Y) simply performs a transport to target location Y and releases 
the object. Decomposing the get and place functions allows the composition of all possible 
combinations in the Move(X to Y) space. Ten trials were performed moving the four object to 
and from different landmark locations. Experiment 1 thus demonstrates (1) the ability to 
transform a spoken sentence into a Move(X to Y) command, (2) the ability to perform visual 
localization of the target object, and (3) the sensory-motor ability to grasp the object and put 
it at the specified location. In ten experimental runs, the system performed correctly. 
Imitation: In this experiment the user chooses the "imitate" state. As stated above, imitation 
is centered on the achieved ends - in terms of observed changes in state - rather than the 
means towards these ends. Before the user performs the demonstration of the action to be 
imitated, the robot queries the vision system, and updates the World Model (Update World 
in Fig 8. II) and then invites the user to demonstrate an action. The robot pauses, and then 
again queries the vision system and continues to query until it detects a difference between 
the currently perceived world state and the previously stored World Model (in State 
Comparator of Fig 1, and Detect Action in Fig 8. II), corresponding to an object displacement. 
Extracting the identity of the displaced object, and its new location (with respect to the 
nearest landmark) allows the formation of an Move(object, location) action representation. 
Before imitating, the robot operates on this representation with a meaning-to-sentence 
construction in order to verify the action to the user, as in "Did you put the dog next to the 
rose?" It then asks the user to put things back as they were so that it can perform the 
imitation. At this point, the action is executed (Execute Action in Fig 8.II). In ten 
experimental runs the system performed correctly. This demonstrates (1) the ability of the 
system to detect the goals of user-generated actions based on visually perceived state 
changes, and (2) the utility of a common representation of action for perception, description 
and execution. 

A Cooperative Game: The cooperative game is similar to imitation, except that there is a 
sequence of actions (rather than just one), and the actions can be effected by either the user 
or the robot in a cooperative manner. In this experiment, the user responds to the system 
request and enters the "play" state. In what corresponds to the demonstration in Warneken 
et al. (2006) the robot invites the user to start showing how the game works. The user then 
begins to perform a sequence of actions. For each action, the user specifies who does the 
action, i.e. either "you do this" or "I do this". The intentional plan is thus stored as a 
sequence of action-agent pairs, where each action is the movement of an object to a 
particular target location. In Fig 6, the resulting interleaved sequence is stored as the "We 
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intention", i.e. an action sequence in which there are different agents for different actions. 
When the user is finished he says "play the game". The robot then begins to execute the 
stored intentional plan. During the execution, the "We intention" is decomposed into the 
components for the robot (Me Intention) and the human (You intention). 
In one run, during the demonstration, the user said "I do this" and moved the horse from 
the lock location to the rose location. He then said "you do this" and moved the horse back 
to the lock location. After each move, the robot asks "Another move, or shall we play the 
game?". When the user is finished demonstrating the game, he replies "Play the game." 
During the playing of this game, the robot announced "Now user puts the horse by the 
rose". The user then performed this movement. The robot then asked the user "Is it OK?" to 
which the user replied "Yes". The robot then announced "Now robot puts the horse by the 
lock" and performed the action. In two experimental runs of different demonstrations, and 5 
runs each of the two demonstrated games, the system performed correctly. This 
demonstrates that the system can learn a simple intentional plan as a stored action sequence 
in which the human and the robot are agents in the respective actions. 

Interrupting a Cooperative Game: In this experiment, everything proceeds as in the 
previous experiment, except that after one correct repetition of the game, in the next 
repetition, when the robot announced "Now user puts the horse by the rose" the user did 
nothing. The robot asked "Is it OK" and during a 15 second delay, the user replied "no". 
The robot then said "Let me help you" and executed the move of the horse to the rose. Play 
then continued for the remaining move of the robot. This illustrates how the robot's stored 
representation of the action that was to be performed by the user allowed the robot to 
"help" the user. 

A More Complex Game: In order to more explicitly test the intentional sequencing 
capability of the system, this experiment replicates the Cooperative Game experiment but 
with a more complex task, illustrated in Figure 7. In this game (Table 5), the user starts by 
movingO the dog, and after each move the robot "chases" the dog with the horse, till they 
both return to their starting places. 



Action 


User 

identifies 
agent 


User Demonstrates Action 


Ref in 
Figure 7 


1. 


I do this 


Move dog from the lock to the rose 


B 


2. 


You do this 


Move the horse from the lion to the lock 


B 


3. 


I do this 


Move the dog from the rose to the hammer 


C 


4. 


You do this 


Move the horse from the lock to the rose 


C 


5. 


You do this 


Move the horse from the rose to the lion 


D 


6. 


I do this 


Move the dog from the hammer to the lock 


D 



Table 5. Cooperative "horse chase the dog" game specified by the user in terms of who does 
the action (indicated by saying) and what the action is (indicated by demonstration). 
Illustrated in Figure 7. 

As in the simplified cooperative game, the successive actions are visually recognized and 
stored in the shared "We Intention" representation. Once the user says "Play the game", the 
final sequence is stored, and then during the execution, the shared sequence is decomposed 
into the robot and user components based on the agent associated with each action. When 
the user is the agent, the system invites the user to make the next move, and verifies (by 
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asking) if the move was ok. When the system is the agent, the robot executes the movement. 
After each move the World Model is updated. As before two different complex games were 
learned, and each one "played" 5 times. This illustrates the learning by demonstration 
(Zollner et al. 2004) of a complex intentional plan in which the human and the robot are 
agents in a coordinated and cooperative activity. 

Interrupting the Complex Game: As in Experiment 4, the objective was to verify that the 
robot would take over if the human had a problem. In the current experiment this capability 
is verified in a more complex setting. Thus, when the user is making the final movement of 
the dog back to the "lock" location, he fails to perform correctly, and indicates this to the 
robot. When the robot detects failure, it reengages the user with spoken language, and then 
offers to fill in for the user. This is illustrated in Figure 7H. This demonstrates the 
generalized ability to help that can occur whenever the robot detects the user is in trouble. 
These results were presented in Dominey (2007). 

9. Discussion 

This beginning of the 21 st century marks a period where humanoid robot mechatronics and 
the study of human and artificial cognitive systems come in parallel to a level of maturity 
sufficient for significant progress to be made in making these robots more human-like in 
there interactions. In this context, two domains of interaction that humans exploit with great 
fidelity are spoken language, and the visual ability to observe and understand intentional 
action. A good deal of research effort has been dedicated to the specification and 
implementation of spoken language systems for human-robot interaction (Crangle & 
Suppes 1994, Lauria et al. 2002, Severinson-Eklund 2003, Kyriacou et al. 2005, Mavrides & 
Roy 2006). The research described in the current chapter extends these approaches with a 
Spoken Language Programming system that allows a more detailed specification of 
conditional execution, and by using language as a compliment to vision-based action 
perception as a mechanism for indicating how things are to be done, in the context of 
cooperative, turn-taking behavior. 

The abilities to observe an action, determine its goal and attribute this to another agent are 
all clearly important aspects of the human ability to cooperate with others. Recent research 
in robot imitation (Oztop et al. 2006, Nehaniv & Dautenhahn 2007, Billard & Schaal 2006) 
and programming by demonstration (Zollner et al. 2004) begins to address these issues. 
Such research must directly address the question of how to determine what to imitate. 
Carpenter and Call (2007) The current research demonstrates how these capabilities can 
contribute to the "social" behavior of learning to play a cooperative game, playing the game, 
and helping another player who has gotten stuck in the game, as displayed in 18-24 month 
children (Werneken et al. 2006, Werneken & Tomasello 2006). While the primitive bases of 
such behavior is visible in chimps, its full expression is uniquely human. As such, it can be 
considered a crucial component of human-like behavior for robots (Carpenter & Call 2007). 
The current research is part of an ongoing effort to understand aspects of human social 
cognition by bridging the gap between cognitive neuroscience, simulation and robotics 
(Dominey 2003, 2005, et al. 2004, 2006, 2007; Dominey & Boucher 2005), with a focus on the 
role of language. The experiments presented here indicate that functional requirements 
derived from human child behavior and neurophysiological constraints can be used to 
define a system that displays some interesting capabilities for cooperative behavior in the 
context of spoken language and imitation. Likewise, they indicate that evaluation of 
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another's progress, combined with a representation of his/her failed goal provides the basis 
for the human characteristic of "helping." This may be of interest to developmental 
scientists, and the potential collaboration between these two fields of cognitive robotics and 
human cognitive development is promising. The developmental cognition literature lays out 
a virtual roadmap for robot cognitive development (Dominey 2005, Werneken et al. 2006). In 
this context, we are currently investigating the development of hierarchical means-end 
action sequences. At each step, the objective will be to identify the behavior characteristic 
and to implement it in the most economic manner in this continuously developing system 
for human-robot cooperation. 

At least two natural extensions to the current system can be considered. The first involves 
the possibility for changes in perspective. In the experiments of Warneken et al. the child 
watched two adults perform a coordinated task (one adult launching the block down the 
tube, and the other catching the block). At 24 months, the child can thus observe the two 
roles being played out, and then step into either role. This indicates a "bird's eye view" 
representation of the cooperation, in which rather than assigning "me" and "other" agent 
roles from the outset, the child represents the two distinct agents A and B for each action in 
the cooperative sequence. Then, once the perspective shift is established (by the adult taking 
one of the roles, or letting the child choose one) the roles A and B are assigned to me and 
you (or vice versa) as appropriate. 

This actually represents a minimal change to our current system. First, rather than assigning 
the "you" "me" roles in the We Intention at the outset, these should be assigned as A and B. 
Then, once the decision is made as to the mapping of A and B onto robot and user, these 
agent values will then be assigned accordingly. Second, rather than having the user tell the 
robot "you do this" and "I do this" the vision system can be modified to recognize different 
agents who can be identified by saying their name as they act, or via visually identified cues 
on their acting hands. 

The second issue has to do with inferring intentions. The current research addresses one 
cooperative activity at a time, but nothing prevents the system from storing multiple such 
intentional plans in a repertory (IntRep in Fig 6). In this case, as the user begins to perform a 
sequence of actions involving himself and the robot, the robot can compare this ongoing 
sequence to the initial subsequences of all stored sequences in the IntRep. In case of a match, 
the robot can retrieve the matching sequence, and infer that it is this that the user wants to 
perform. This can be confirmed with the user and thus provides the basis for a potentially 
useful form of learning for cooperative activity. 

In conclusion, the current research has attempted to build and test a robotic system for 
interaction with humans, based on behavioral and neurophysiological requirements derived 
from the respective literatures. The interaction involves spoken language and the 
performance and observation of actions in the context of cooperative action. The 
experimental results demonstrate a rich set of capabilities for robot perception and 
subsequent use of cooperative action plans in the context of human-robot cooperation. This 
work thus extends the imitation paradigm into that of sequential behavior, in which the 
learned intentional action sequences are made up of interlaced action sequences performed 
in cooperative alternation by the human and robot. While many technical aspects of robotics 
(including visuomotor coordination and vision) have been simplified, it is hoped that the 
contribution to the study of imitation and cooperative activity is of some value. 
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1 . Abstract 

Most recent humanoid research has focused on balance and locomotion. This concentration is 
certainly important, but one of the great promises of humanoid robots is their potential for 
effective interaction with human environments through manipulation. Such interaction has 
received comparatively little attention, in part because of the difficulty of this task. One of the 
greatest obstacles to autonomous manipulation by humanoids is the lack of efficient collision- 
free methods for reaching. Though the problem of reaching and its relative, pick-and-place, 
have been discussed frequently in the manipulator robotics literature- e.g., (Lozano-Perez et 
al., 1989); (Alami et al., 1989); (Burridge et al., 1995)- researchers in humanoid robotics have 
made few forays into these domains. Numerous subproblems must be successfully addressed 
to yield significant progress in humanoid reaching. In particular, there exist several open 
problems in the areas of algorithms, perception for modeling, and control and execution. This 
chapter discusses these problems, presents recent progress, and examines future prospects. 

2. Introduction 

Reaching is the one of the most important tasks for humanoid robots, endowing them with 
the ability to manipulate objects in their environment. Unfortunately, getting humanoids to 
reach efficiently and safely, without collision, is a complex problem that requires solving 
open subproblems in the areas of algorithms, perception for modeling, and control and 
execution. The algorithmic problem requires the synthesis of collision-free joint-space 
trajectories in the presence of moving obstacles. The perceptual problem, with respect to 
modeling, is comprised of acquiring sufficiently accurate information for constructing a 
geometric model of the environment. Problems of control and execution are concerned with 
correcting deviation from reference trajectories and dynamically modifying these 
trajectories during execution to avoid unexpected obstacles. This chapter delves into the 
relevant subproblems above in detail, describes the progress that has been made in solving 
them, and outlines the work remaining to be done in order to enable humanoids to perform 
safe reaching in dynamic environments. 

3. Problem statement 

The problem of reaching is formally cast as follows. Given: 
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1. a world = K J 

2. the current time t ; T is then defined as the interval [t , °o] 

3 . a robot 

4. a smooth manifold ( - R-> called the state space of; let K '. — ► be a function that 
maps state-space to the robot's configuration space 

5. the state transition equation is .V = f{x, 11, t), where * £ and u : T — ► generates a 
vector of control inputs (,— ~pv'-) as a function of time 

6. a nonstationary obstacle region (J) (~ , Vf 6 T; oisfftj ^t € T i s then the 
projection of obstacles in the robot's configuration space into state-space (i.e., 
* _1 ( „*(')) = Mi)and F „m= - .^H). 

?• C 1R 6 i s tne reachable workspace 1 of 

8. a direct kinematics function, F : — > that transforms robot states to operational-space 
configurations of one of the robot's end effectors 

9. a set of feasible operational-space goal functions of time, G such that Vg E G, g : T — > 

10. a feasible state-space Boolean function S : T x g x — > 0,1 where g £ G 

11. loc jre& the state of the robot at to 

generate the control vector function u(.) from time t > to such that x t c free(f) where x t = 
3co + Jl f(x,u(t),t} At, for t > to and there exists a time tj for which min- 6 e F(.v(f,-)) - g(t;) < 

c and i5(ii,ff('i),^((,)) = 1 for all f, > fy, or correctly report that such a function u(.) does not 

exist. 

Informally, the above states that to solve the reaching problem, the commands sent to the 

robot must cause it to remain collision-free and, at some point in the future, cause both the 

operational space distance from the end-effector to one of the goals to remain below a given 

threshold 8 and the state-space of the robot to remain in an admissable region. 

The implications of the above formal definition are: 

• The state transition function /(.) should accurately reflect the dynamics of the robot. 
Unfortunately, due to limitations in mechanical modeling and the inherent 
uncertainty of how the environment might affect the robot, /(.) will only 
approximate the true dynamics. Section 4.3 discusses the ramifications of this 
approximation. 

• The robot must have an accurate model of its environment. This assumption will 
only be true if the environment is instrumented or stationary. The environments in 
which humanoids are expected to operate are dynamic (see #6 above), and this 
chapter will assume that the environment is not instrumented. Constructing an 
accurate model of the environment will be discussed in Section 4.2. 

• The goals toward which the robot is reaching may change over time; for example, 
the robot may refine its target as the robot moves nearer to it. Thus, even if the 
target itself is stationary, the goals may change given additional information. It is 
also possible that the target is moving (e.g., a part moving on an assembly line). 
The issue of changing targets will be addressed in Section 4.1. 

• Manipulation is not explicitly considered. It is assumed that a separate process can 
grasp or release an object, given the operational-space target for the hand and the 
desired configuration for the fingers (the Boolean function <J(.) is used to ensure 



1 The reachable workspace is defined by Sciavicco & Siciliano (2000) to be the region of operational-space 
that the robot can reach with at least one orientation. 
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that this latter condition is satisfied). This assumption is discussed further in the 
next section. 

3 Related work 

A considerable body of work relates to the problem defined in the previous section yet does 
not solve this problem. In some cases, researchers have investigated similar problems, such 
as developing models of human reaching. In other cases, researchers have attempted to 
address both reaching and manipulation. This section provides an overview of these 
alternate lines of research, though exhaustive surveys of these areas are outside of the scope 
of this chapter. Humanoids have yet to autonomously reach via locomotion to arbitrary 
objects in known, static environments, much less reach to objects without collision in 
dynamic environments. However, significant progress has been made toward solving this 
problems recently. This section concludes with a brief survey of methods that are directly 
applicable toward solving the reaching problem. 

3.1 Models of reaching in neuroscience 

A line of research in neuroscience has been devoted to developing models of human 
reaching; efficient, human-like reaching for humanoids has been one of the motivations for 
this research. Flash & Hogan (1985), Bullock et al. (1993), Flanagan et al. (1993), Crowe et al. 
(1998) and Thor-oughman & Shadmehr (2000) represent a small sample of work in this 
domain. The majority of neuroscience research into reaching has ignored obstacle 
avoidance, so the applicability of this work toward safe humanoid reaching has not been 
established. Additionally, neuroscience often considers the problem of pregrasping, defined 
by Arbib et al. (1985) as a configuration of the fingers of a hand before grasping such that 
the position and orientation of the fingers with respect to the palm's coordinate system 
satisfies a priori knowledge of the object and task requirements. In contrast to the 
neuroscience approach, this chapter attempts to analyze the problem of humanoid reaching 
from existing subfields in robotics and computer science. Recent results in the domains of 
motion planning, robot mapping, and robot control architectures are used to identify 
remaining work in getting humanoids to reach safely and efficiently. This chapter is 
unconcerned with generating motion that is natural in appearance by using pregrasping 
and human models of reaching, for example. 

3.2 Manipulation planning 

Alami et al. (1997), Gupta et al. (1998), Mason (2001), and Okada et al. (2004) have 
considered the problem of manipulation planning, which entails planning the movement of a 
workpiece to a specified location in the world without stipulating how the manipulator is to 
accomplish the task. Manipulation planning requires reaching to be solved as a subproblem, 
even if the dependence is not explicitly stated. As noted in LaValle (2006), existing research 
in manipulation planning has focused on the geometric aspects of the task while greatly 
simplifying the issues of grasping, stability, friction, mechanics, and uncertainty. The 
reaching problem is unconcerned with grasping (and thereby friction) by presuming that 
reaching and grasping can be performed independently. The definition provided in Section 
2 allows for treatment of mechanics (via/(.), the state transition function) and stability and 
uncertainty (by stating the solution to the problem in terms of the observed effects rather 
than the desired commands). Additionally, the problem of reaching encompasses more 
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tasks than those used in pick-and-place; for example, both pointing and touching can be 
considered as instances of reaching. 

A late development by Stilman & Kuffner, Jr. (2007) addresses manipulation planning 
amongst movable obstacles. The reaching problem as defined in Section 2 permits the 
obstacles to be movable by the humanoid. Many of the issues described in this chapter (e.g., 
constructing a model of the environment, monitoring execution, etc.) need to be resolved to 
fully explore this avenue, but the ability to move obstacles as necessary will certainly admit 
new solutions to some instances of the reaching problem. 

3.3 Recent work directly applicable to humanoid reaching 

Recent work in humanoid robotics and virtual humans is directly applicable toward 
efficient, safe humanoid reaching. For example, work by Brock (2000) permits reaching in 
dynamic environments by combining a planned path with an obstacle avoidance behavior. 
More recent work by Kagami et al. (2003) uses stereo vision to construct a geometric model 
of a static environment and motion planning and inverse kinematics to reach to and grasp a 
bottle with a stationary humanoid robot. Liu & Badler (2003); Kallmann et al. (2003); 
Bertram et al. (2006) and Drumwright & Ng-Thow-Hing (2006) focused on developing 
algorithms for humanoid reaching; the algorithms in the latter two works are probabilistically 
complete, while the algorithms in (Liu & Badler, 2003) and (Kallmann et al., 2003) are not 
complete in any sense. All four works assumed static environments, perfect control and 
holonomic constraints. 

4. Outstanding issues 

This section discusses the issues that remain to solve the reaching problem, as follows: 

1 . Constructing an accurate model of the environment 

2. Planning collision-free motions in dynamic environments 

3. Correcting deviation from the desired trajectory due to imperfect control 

4. Avoiding both fixed and moving obstacles during trajectory execution 

The first item has received the least research attention to date and therefore includes the 
majority of open problems in collision-free humanoid reaching. Section 4.2 discusses 
progress and prospects in this area. Planning collision-free motions, at least in static 
environments, has received considerable attention; Section 4.1 discusses why this problem is 
challenging from an algorithmic standpoint and addresses extensions to dynamic 
environments. Finally, correcting deviation from the planned trajectory and avoiding 
obstacles during trajectory execution are key to reach the target in a safe manner. Section 4.3 
discusses these two issues. 

4.1 Algorithmic issues 

The best studied aspect of the reaching problem is the algorithmic component, which is an 
extension to the general motion planning introduced below. Section 4.1.1 formally relates 
the problem of reaching to the general motion planning problem, and analyzes the 
complexity of the latter. Section 4.1.2 introduces sample-based motion planning, a paradigm 
for circumventing the intractability of motion planning; the following section discusses the 
extension of a popular sample-based motion planner to respect differential constraints. 
Finally, Sections 4.1.4-4.1.8 discuss motion planning issues highly relevant to humanoid 
reaching, namely planning under uncertainty, potential incompleteness resulting from 
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multiple inverse kinematics solutions, planning to nonstationary goals, and planning in 

dynamic environments. 

Researchers have investigated ways for planning collision-free motions from one 

configuration to another since the introduction of the Piano Mover's Problem, also known as 

the Static Mover's Problem Reif (1979). The Piano Mover's problem can be stated as follows 

(excerpted from LaValle (2006)). 

Given: 

1 . a world , where = R 2 or = 1R 3 

2. a semi-algebraic obstacle region c: 

3 . a collection of m semi-algebraic links of a robot, 1, 2, ... m 

4. the configuration space of the robot; free can then be defined as the subset of 
configuration space which does not cause the robot's geometry to intersect with 
any obstacles 

5. the initial configuration of the robot, qj 

6. the goal configuration of the robot, qc 

generate a continuous path t : [0,1] — > f„ c such that r(0) = q; and x(l) = qc or correctly report 
that such a path does not exist. The term semi-algebraic refers to a geometric representation 
that is composed by Boolean operations on implicit functions. 

4.1.1 Complexity of motion planning 

The definition of the Piano Mover's Problem is quite similar to the problem formulation for 
reaching at the beginning of this chapter. Indeed, an instance of the reaching problem can be 
transformed into an instance Piano Mover's Problem given the following constraints: 

• the obstacle region, is stationary (i.e., (f;) = (f;),Vf;, tj e T) 

• = from which follows &M = r,V* e and xo = qi 

• = 

• G consists of a single element, g, which is nonstationary, and there exists only one 
robot configuration qc that results in g (i.e., F" 1 (g) = qc) 

• j. h f{x,it(t),t) dt = ii{tf,) — xlt,,} (implies that the command is the new robot 
state) 

Reif (1979) showed that the Piano Mover's Problem is PSPACE-complete, implying NP- 
hardness. Additionally, the best known algorithm for solving the Piano Mover's problem 
(complexity-wise) is Canny's Roadmap Algorithm (Canny, 1993), which exhibits running- 
time exponential in the configuration space; aside from being intractable, the algorithm is 
reportedly quite difficult to implement (LaValle, 2006). Later work by Reif & Sharir (1994) 
proved that planning motions for a robot with fixed degrees-of-freedom and velocity 
constraints in the presence of moving obstacles with known trajectories is PSPACE-hard; 
thus, the constraints that were imposed transforming the reaching problem into the Piano 
Mover's Problem are unlikely to make the former problem easier. 

4.1.2 Sample-based motion planning 

The paradigm of sample-based motion planning was introduced with the advent of the 
randomized potential field (Barraquand & Latombe, 1991). Sample-based algorithms trade 
algorithmic completeness for excellent average-case performance and ease of 
implementation. In fact, completeness was not cast aside; rather, it was relaxed to lesser 
constraints, probabilistic completeness and resolution completeness. It is said that an algorithm is 
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probabilistically complete if the probability of finding a solution, if it exists, tends to unity as 
the number of samples increase. Similarly, a motion planning algorithm is resolution 
complete if a solution, if it exists, will be found in finite time given sufficiently dense 
sampling resolution over the domain of configuration space. Note that neither weaker form 
of completeness permits this class of algorithm to definitively state that a path does not 
exist. Finally, the underlying approach of sample-based planning is quite different from 
adapting classical search algorithms (i.e., A*, best-first, etc.) to a discretized grid; such an 
approach is generally intractable due to the combinatorial explosion of configuration space. 
The most popular sample-based algorithm is currently the rapidly-exploring random tree 
(RRT) LaValle (1998). RRTs are popular due to their inherent bias toward the largest 
Voronoi region of configuration space (i.e., the largest unexplored region) during 
exploration. Efficient exploration is critical for sample-based algorithms because their 
running times generally increase linearly with the number of samples. 

4.1.3 Motion planning under differential constraints 

In addition to the advantage of efficient exploration, RRTs allow for planning under 
differential (e.g., nonholonomic) constraints, through kinodynamic planning. Kinodynamic 
planning plans in the control space, rather than in configuration space, and is therefore able 
to respect dynamic constraints. Kinodynamic planning theoretically permits motion to be 
planned for hu-manoids, which generally use bipedal locomotion and are nonholonomically 
constrained. As might be expected, kinodynamic planning is harder computationally than 
in the unconstrained case; additionally, kinodynamic planning requires a model of the 
system's dynamics and a control method for solving a two-point boundary value problem 2 . 
Planning directly in the state-actuator space of the humanoid is infeasible: the motion planner 
would not only have to avoid obstacles but also balance the humanoid and provide locomotion. 
An accurate model of the robot's dynamics would be required as well. Alternatively, planning 
could occur over the robot's configuration space augmented with planar position and 
orientation of the base. Constraints would be enforced kinematically rather than dynamically, 
and a trajectory rescaling mechanism could be used to enforce the dynamic constraints after 
planning. For example, kinematic constraints could be used to allow the base to translate or 
rotate, but not translate and rotate simultaneously. Planning in this augmented configuration 
space would require a locomotion controller that translates desired differential position and 
orientation of the base into motor commands. Once a plan were constructed in the augmented 
configuration space, the locomotion controller and joint-space controllers would generate the 
proper motor commands for a given configuration space trajectory. 

Finally, it should be noted that if humanoids moved on holonomic bases, simpler methods 
could potentially be employed for humanoid reaching. For example, Maciejewski & Klein 
(1985) combines inverse kinematics with obstacle avoidance for redundant manipulators 
under holonomic constraints; some mobile manipulators fit into this category. Though the 
method of Maciejewski and Klein can cause the robot to become trapped in local minima, it 
presents minimal computational requirements. 



2 In the context of this work, the two-point boundary problem can be considered to be the problem of 
getting from a given state to a desired state under nonholonomic constraints. For example, the problem 
of parallel parking a car can be considered a two-point boundary problem: a series of actions is required 
to move the car into a parking space, even if only a simple translation is required (e.g., the car initially is 
aligned laterally with the parking space). 
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4.1.4 Motion planning for humanoid reaching 

The RRT has been applied successfully to reaching for humanoids in virtual environments in 
(Kuffner, Jr., 1998); (Liu & Badler, 2003); (Kallmann et al, 2003); (Bertram et al, 2006) and 
(Drumwright & Ng-Thow-Hing, 2006), among others. Additionally, the RRT has been applied 
to reaching for an embodied humanoid by Kagami et al. (2003), although, as stated in Section 
2, the environment was considered to be stationary and locomotion was not utilized. 
Unfortunately, several assumptions of current, effective motion planning algorithms are 
unsuitable for humanoid reaching, as follows: 

1. An accurate model of the environment and a precise mechanism for control are 
required. 

2. A single goal configuration is assumed. 

3. Targets for reaching do not change over time. 

4. The environment is static. 

These issues are explored further in the remainder of this section. 

4.1.5 Planning under uncertainty 

This chapter generally assumes that a sufficiently accurate model of the environment can be 
constructed and that the robot can be controlled with some degree of accuracy. However, 
these assumptions are often unrealistic in real-world robotics. It would be advantageous to 
construct plans that would maximize the distance from the robot to obstacles, for example, 
to minimize deleterious effects of uncertainty. LaValle & Hutchinson (1998) have explored 
the issue of planning under sensing and control uncertainty; however, their use of dynamic 
programming (Bellman, 1957) has restricted applications to planar robots. As an alternative, 
Lazanas & Latombe (1995) proposed an approach based on landmarks, regions of the state 
space where sensing and control are accurate. The assumption that such regions exist is 
significant. The limited application of these two methods illustrates the difficulty of 
planning while maximizing objectives, which is known as optimal planning (LaValle, 2006). 
Finally, Baumann (2001) proposes a planning method that iteratively modifies a trajectory; a 
fitness function judges the quality of the modified trajectory versus the original, in part 
based on distance to obstacles. However, this work has yet to be subjected to peer-review. 

4.1.6 Incompleteness resulting from multiple IK solutions 

A single operational-space goal generally corresponds to an infinite number of robot 
configurations, given the hyper-redundant degrees-of -freedom of most humanoids 3 . It is possible 
that collision-free paths exist only for a subset of the space of collision-free inverse kinematics 
solutions. Drumwright & Ng-Thow-Hing (2006) addressed that problem by maintaining a list of 
goal configurations that is continually grown using inverse kinematics; the motion planner 
frequently attempts to reach arbitrary goals in the list. Thus, the motion planner can avoid 
becoming stuck in planning to unreachable goals by not committing to any particular goal. 

4.1.7 Motion planning to nonstationary goals 

Planning trajectories to nonstationary goals has received little attention in the motion 
planning community; however, two concerns are evident. The goals may change with 



3 We assume that all available degrees-of-freedom are used for reaching, rather than for achieving 
secondary tasks like singularity avoidance, e.g., (Tanner & Kyriakopoulos, 2000). 
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increasing sensory data, leading to new estimates of the target's position and orientation. 
Additionally, the target itself may be moving, perhaps with a predictable trajectory. 
The first issue is readily solvable using existing methods. If already in the process of 
planning, the goal can be replaced without ill effects: the results from sample-based 
planning methods will not be invalidated. If a plan has been determined, replanning can be 
performed using the results already determined (i.e., the roadmap, tree, etc.) to speed 
computation. Alternatively, inverse kinematics can be utilized to modify the generated plan 
slightly, somewhat in the manner used for motion retargeting by Choi & Ko (2000). 
Moving targets are far more difficult to manage. Current sample-based planning methods 
have not focused on this problem, and the complexity of adapting existing methods to this 
purpose is unknown. Again, it is imaginable that existing plans could be modified using 
inverse kinematics, though replanning may be required if the target is moving quickly. 
Alternatively, Brock's method (discussed in Section 4.3) could potentially be utilized with 
some modifications toward this purpose. 

4.1.8 Motion planning in dynamic environments 

A simple means to address motion planning in dynamic environments adds time as an extra 
dimension of configuration-space. As LaValle (2006) notes, the planning process must 
constrain this extra dimension to move forward only. This approach of augmenting the 
configuration space with time can fail because the dynamics of the robot are not considered: 
it may not be possible or advisable for a robot to generate sufficient forces to follow the 
determined trajectory. An alternative to this approach is to use kinodynamic planning, as 
described in Section 4.1.3 to plan in the control space of the robot. However, kinodynamic 
planning does not appear to be applicable to configuration spaces above twelve dimensions, 
in addition to the difficulties with this approach described in Section 4.1.3. 
Dynamic replanning, which refers to fast replanning as needed, is an alternative to methods 
which plan around dynamic obstacles. Dynamic replanning does not require the trajectories 
of dynamic obstacles to be known (dynamic obstacles are treated as stationary), and thus 
avoids the additional complexity of planning around these obstacles. Dynamic replanning 
may be the best option for the high-dimensional configuration spaces of humanoids. 
Kallmann & Mataric (2004) has explored online modification of existing sample-based 
roadmaps for virtual humanoids; unfortunately, that work indicated that the modification is 
likely no faster than generating a new plan. However, newer algorithms by Ferguson & 
Stentz (2006) and Zucker et al. (2007), also based on RRTs, have proven adept at replanning 
in nonstationary environments with large configuration spaces by modifying existing trees 
dynamically. Additionally, Drumwright & Ng-Thow-Hing (2006) have indicated that even 
planning anew in real-time for humanoids using a slightly modified RRT algorithm is 
nearly performable using current computers. 

Recent work by Jaillet & Simeon (2004) and van den Berg & Overmars (2005) has taken a 
two-phase approach to motion planning in dynamic environments. The first phase entails 
constructing a probabilistic roadmap (Kavraki et al., 1996) over the persistent parts of the 
environment offline. In the online phase, a graph search algorithm finds a feasible path 
around dynamic obstacles. These methods are compelling because the bulk of computation, 
constructing the roadmap, is performed offline. Nevertheless, further research is required to 
determine efficient ways to update the roadmap as the environment changes (e.g., doors are 
opened, furniture is moved, etc.) before these methods can be used for humanoid reaching, 
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Regardless of the method employed, a significant concern is that the process of planning can 
itself cause possible solutions to disappear, because planning occurs in real-time. An 
additional significant concern is the requirement of many methods that the trajectories of 
the obstacles to be known a priori; filtering techniques- e.g., Kalman filters Kalman & Bucy 
(1961)- might permit short-term predictions, but long-term predictions will be problematic 
unless the obstacles follow balistic trajectories (which will prove difficult for the robot to 
avoid anyhow). 

4.1.9 Summary 

Substantial progress has been made in motion planning in the past decade, leading to 
tractable solutions of high-dimensional planning problems. Indeed, in the areas of planning 
to nonstationary goals and planning in dynamic environments, researchers are on the cusp 
of solutions that are viable for humanoid reaching. However, considerable work still 
remains in the area of planning under uncertainty. 

4.2 Perception for modeling issues 

We ignore the problems of object recognition and object pose determination, which are 
beyond the scope of this paperm, and focus on the perceptual issues related to modeling the 
environment for purposes of collision detection 4 , assuming that the humanoid is equipped 
with a directional range finder. The problem of simultaneous localization and mapping 
(SLAM) is well studied in the mobile robotics community, where significant success has 
been achieved at developing methods to construct maps of 2?D environments; some success 
has been achieved building three-dimensional maps as well. The natural inclination is to see 
whether SLAM techniques for mobile robots can be adapted to the problem of environment 
modeling toward humanoid reaching. Human environments are dynamic, humanoids 
manipulate objects (thus changing the world), and environment modeling is conducted with 
respect to planning; these challenges make current SLAM methods for mobile robotics 
difficult to utilize for humanoid reaching. Indeed, significant obstacles remain before 
humanoids can construct environment models suitable for motion planning. The remainder 
of this section discusses the relevant issues toward this purpose, namely representation, 
localization, exploration, and nonstationary environments. 

4.2.1 Representation of environment model 

There exist several possible representations for modeling the environment, including 
volumetric point sets (Thrun et al v 2003); occupancy grids (Moravec & Elfes, 1985), (Elfes, 
1989); 3D models (Teller, 1998), (Kagami et al., 2003) and feature-based maps (Kuipers et al, 
1993). However, the application of reaching presents several requirements. First, the 
representation must allow for fast intersection testing. Second, the representation must be 
able to efficiently manage the prodigious amounts of data that 3D range scans generate. 
These first two requirements exclude volumetric point sets. Fast updates of the 
representation from sensory data are also required. This stipulation excludes the use of 3D 
models, which require considerable post-processing including iso-surface extraction via the 



4 Alfhough the goal functions for the reaching problem are given in the global coordinate frame, it is 
quite natural to use ego-centric frames instead. As a result, localization is likely not required to perform 
reaching, except for effectively constructing the environment model. 
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Marching Cubes algorithm (Lorensen & Cline, 1987) or stitching (Turk & Levoy, 1994) and 
aligning and registering (Mayer & Bajcsy 1993); (Pito, 1996). In addition to the requirements 
listed above, the ability to readily extract high-level features from the representation for use 
with localization, described in Section 4.2.2, would be advantageous. 

Drumwright et al. (2006) used an octree, which may be considered as an extension of 
occupancy grids, for modeling environments for reaching. The octree representation results 
in efficient storage, permits fast updates to the representation from range data, and is 
capable of performing fast intersection testing with the robot model. Drumwright et al. 
(2006) assumed perfect localization, and we are unaware of work that extracts features from 
octrees (or their two-dimensional equivalent, quadtrees) for the purpose of localization. 
However, there is precedent for feature extraction from octrees, e.g., (Sung et al., 2002). 
An alternative to the octree representation would use high-level features (e.g., landmarks, 
objects, or shapes) as the base representation. Such features would serve well for input to 
one of the popular methods for simultaneous localization and mapping (SLAM) such as 
(Smith & Cheeseman, 1985); (Smith et al., 1990) or (Fox et al., 1999). Recognition of objects in 
the context of mapping and localization has been limited generally to those objects which 
are of interest to mobile robots, including doors (Avots et al., 2002), furniture (Hahnel et al., 
2003), and walls (Martin & Thrun, 2002). Additionally, representations used typically 
involve geometric primitives, which fail to realize the potential benefit of using identified 
objects to infer occluded features. The difficulty of performing object recognition in 
unstructured environments makes the near-term realization of this benefit unlikely. 

4.2.2 Localization 

The humanoid must be able to localize itself (i.e., know its planar position and orientation) 
with respect to the environment, if not globally. Recent work by Fox et al. (1999) indicates 
that localization can be performed successfully with range data even in highly dynamic 
environments. This work, as well as other probabilistic approaches, typically can provide 
measures of certainty of their localization estimates. The estimated variance can be used to 
filter updates of the environment model; the environment model will be updated only if the 
certainty of the prediction is above a given threshold. Alternatively, localization accuracy 
can be quite high, e.g., on the order of centimeters Yamamoto et al. (2005), if the 
environment is modestly instrumented. 

4.2.3 Nonstationary environments 

Modeling nonstationary environments requires the management of three types of obstacles, 
which we call dynamic, movable, and static. Dynamic obstacles are capable of moving on their 
own (e.g., humans, cars, etc.), while movable obstacles (e.g., cups, furniture, books, etc.) 
must be moved by a dynamic obstacle. Static obstacles, such as walls, are incapable of 
movement. The three cases would ideally be managed separately, but a possible solution 
could treat movable and static obstacles identically yet allow for gradual changes to the 
environment. 

Three recent approaches have made strides toward modeling nonstationary environments. 
The first approach, introduced independently by Wang & Thorpe (2002) and Hahnel et al. 
(2002), attempts to identify dynamic objects and filter them from the sensory data. The 
second approach, (Hahnel et al., 2003), uses an expectation-maximization (Dempster et al., 
1977) based algorithm to repeatedly process the sensor readings and predict whether a 
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given feature belongs to a dynamic obstacle; dynamic obstacles are not incorporated into the 
constructed map. Finally, Biswas et al. (2002) attempt to model dynamic obstacles as 
movable rigid bodies using an occupancy grid and an expectation-maximization based 
algorithm. Unfortunately, the first two approaches described above are ill-suited for 
movable obstacles, while the second approach is unable to handle dynamic obstacles. 

4.2.4 Exploration 

Exploration of the environment to facilitate reaching requires both directed locomotion and 
directed/ active sensing. Both exploration and active sensing have been studied extensively 
in the context of mobile robot mapping, it is unlikely that this research is fully applicable 
toward our problem. Exploration in the mapping context is conducted in a greedy manner: 
movement is directed to build a complete map of the environment with respect to some 
criterion, e.g., minimum time (Yamauchi, 1997), minimum energy expenditure (Mei et al., 
2006), maximum safety (Gonzalez-Banos & Latombe, 2002), etc. In contrast, the reaching 
problem requires a balance between exploration and exploitation: the environment must be 
sufficiently explored, but not at the expense of finding a valid collision-free path. The 
balance between exploration and exploitation precludes the use of active sensing 
frameworks, such as that described by Mihaylova et al. (2002), to guide exploration. Finally, 
moving obstacles must be avoided during exploration, perhaps by using reactive 
approaches like VFH Borenstein & Koren (1989) or potential fields Khatib (1986). 
Choset & Burdick (2000) refer to environmental exploration in the context of motion 
planning sensor based motion planning, and introduced a new data structure, the hierarchical 
generalized voronoi graph, to solve this problem. Unfortunately, their work was targeted 
toward mobile robots, and it appears highly unlikely to scale to humanoid robots with high- 
dimensional configuration spaces. 

Researchers have yet to propose methods to perform directed exploration with respect to 
motion planning for humanoid robots. One possibility is to adapt the concepts employed for 
mobile robot exploration to humanoids. For exanvple,frontier-based exploration Yamauchi (1997), 
which directs the robot to move to the area between open space and uncharted territory (i.e., 
the frontier), could be combined with a heuristic to select frontiers nearest the target. 

4.2.5 Summary 

This section covered the relevant issues necessary to construct a model of the environment 
for reaching. The issues of localization and selecting a proper representation for modeling 
the environment seem manageable through existing research. However, considerable work 
remains in the areas of modeling nonstationary environments and exploration for motion 
planning. 

4.3 Control and Execution issues 

Ideally, kinematic commands could be issued to a humanoid, the robot would execute those 
commands with perfect precision, and obstacles would not move. Those assumptions do not 
hold in the real- world: humanoids are typically holonomically constrained, controllers are 
imperfect, robots exhibit some degree of inaccuracy due to mechanical tolerances, the model 
of the environment may be flawed, and obstacles can appear suddenly. 

The process of getting a manipulator to reach to an operational space target generally 
consists of the following sequence of steps: 
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1. A motion planner generates kinematic commands; the environment is often 
considered to be static for manipulators, so motion planning may consist of using 
cubic or quinticsplines to form a trajectory with via points. 

2. The kinematic commands are input to a mechanism for rescaling the trajectory. The 
trajectory is rescaled to a longer duration, if necessary, to permit the robot to 
execute the trajectory without deviating from the prescribed path and prevent 
possible damage to the robot. 

3. The commands are continually fed to a controller, which translates kinematic 
commands into actuator commands. 

This process runs counter to best practices in mobile robotics because it provides no means 
to monitor the execution of the trajectory and react to exigent events, such as humans 
getting in the way. For humanoids, the process above must provide a means to supervise 
the robot's execution of the plan to avoid obstacles and correct deviation from the plan. 
This section discusses the causes, effects, and remedies for controller deviation and examines 
execution monitoring, which attempts to avoid obstacles while driving the robot to its target. 

4.3.1 Controller deviation 

Robot controllers that minimize command error generally incorporate a model of the robot's 
dynamics; in particular, an inverse dynamics model of the robot is utilized to map desired 
accelerations to actuator torques. Controllers that use inverse dynamics models will still 
exhibit some deviation from the desired trajectory due to approximations of the robot's true 
dynamics (e.g., Coulombic friction, infinitely rigid links, etc.) Of far more concern, however, 
is that an algorithm does not exist for computing the inverse dynamics for robots with 
floating-bases experiencing contact constraints (e.g., for humanoids standing on a floor). As 
a result, humanoids are frequently controlled using feedback controllers (e.g., PD, PID, 
PIDD 2 , etc.) These controllers necessarily result in deviation from the commanded trajectory 
because their generated motor commands are solely a function of command error. 
For systems with holonomic differential constraints, deviation from the reference trajectory 
is the primary concern. For systems that are nonholonomically constrained, such as bipeds, 
deviation results in an additional difficulty: correcting deviation may require solving a two- 
point boundary value problem. As Section 4.1.3 noted, determining a general solution to 
this problem for humanoids is currently not practical. Thus, the question remains: given 
that the robot has deviated from its trajectory and is not in danger of colliding with 
obstacles, how is the deviation corrected? As in Section 4.1.3, we assume that the humanoid 
has a locomotion controller with inputs of desired planar position and orientation 
differentials for the base. We make no assumptions about the types of controllers employed 
for motor control. If the robot deviates from its planned trajectory for either the base or one 
of the joints, feedback is incorporated into the next commands sent to the controller. If the 
locomotion controller is unable to accomodate the desired differentials due to kinematic 
constraints 5 , it reports this problem, and a new path for the base is planned that respects 
these constraints. The advantage of this approach is that the planning method can remain 
ignorant of the robot's dynamics model and controller internals. The most significant issue 
remaining is to determine whether the humanoid is capable of becoming stuck in a 
particular location due to kinematic constraints; however, this issue is entirely dependent 
upon the robot's mechanics and controllers and is thus robot-specific. 



5 As in Section 4.1.3, we assume that the trajectory has been rescaled to respect dynamic constraints. 
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4.3.2 Global methods for execution monitoring 

The execution of the planned trajectory must be monitored to avoid both dynamic obstacles 
and static obstacles that were circumlocuted during planning but, due to unintentional 
deviation from the trajectory, that the robot is in imminent danger of contacting. Ideally, a 
policy would exist that could issue avoidance commands to the robot with expediency. Such 
a policy, denoted asuf- Jt(*4), would issue commands (i.e., u) as a function of the current 
state of the robot (q, <j) and the current obstacle region, . This policy theoretically could be 
constructed offline through reinforcement learning or dynamic programming; practically, 
however, the high-dimensionality of the robot's state-action spaces and the infinite number 
of configurations of the environment make this approach intractable (Bellman, 1957). 
Alternatively, it is conceivable that a policy could be determined nearly instantaneously 
using reinforcement learning as the environmental model changes (i.e., 7i(.) would be a 
function of the robot's configuration only). Again, such an approach is currently infeasible 
given the robot's high-dimensional state-action spaces, and dynamic replanning would 
likely prove to be a better alternative. Yang & LaValle (2000) provide a method for feedback 
motion planning, which entails constructing motion plans that are viable from entire regions 
of state-space. Perhaps because objective criteria, upon which dynamic programming is 
based, do not have to be considered, their method seems to be slightly more tractable than 
dynamic programming. However, this method has only been applied to five-dimensional 
configuration spaces, and it appears unlikely to scale much further. 

The methods described above are known as "global" methods, because good actions (i.e, 
those that lead to the goal and away from obstacles) are always generated, regardless of the 
configurations of the robot and environment. As noted above, the combined state-action 
spaces are immense, thus generally preventing global methods from being used for 
humanoids. "Local methods", in contrast to global methods, are generally tractable, but 
may result in failure to reach the goal; fortunately, these methods typically work well at 
avoiding obstacles. 

4.3.3 Local methods for execution monitoring 

One local method, by Brock & Khatib (2002), currently permits reaching with fast obstacle 
avoidance. This method is based on elastic strips, an extension of Khatib's operational space 
framework Khatib (1986) that employs virtual rubber bands. These virtual rubber bands 
warp planned trajectories away from obstacles. The most significant drawback to using 
elastic strips lies within its use of the robot's joint-space inertia matrix; this matrix is 
frequently unknown-it requires precise knowledge of the robot's inertial properties- and 
changes if the robot carries objects or interacts with the environment. In addition, like all 
local methods, elastic strips can cause the robot to become stuck in local minima. However, 
Brock did implement this method on both a mobile manipulator in an environment 
populated with moving obstacles and a simulated humanoid. 

4.3.4 Summary 

Control and locomotion are both significant problems in humanoid robotics, and assessing 
future prospects in these areas is difficult. However, the assumption of a planar controller 
for locomotion seems reasonable; it would prove difficult to have the roboticist juggle 
balance, locomotion, and task performance simultaneously. Indeed, many current 
humanoid designs prohibit direct access to actuators. 
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For execution monitoring, only elastic strips has proven successful for real-time obstacle 
avoidance on a mobile manipulator that followed a given trajectory. Depending on how 
seriously inaccuracies in the joint-space inertia matrix affect elastic strips, this method might 
prove extendible to humanoids. Viable alternatives include dynamic replanning and the 
offline-online probabilistic roadmap based methods of Jaillet & Simeon (2004) and van den 
Berg & Overmars (2005) described in Section 4.1, using dense roadmaps of configuration 
space. Regardless of the method chosen for execution monitoring, dynamic obstacles still 
require representation in the robot's perceptual model of the environment, as described in 
Section 4.2. 



5 Conclusion 

The past decade has seen remarkable progress in the types of motion planning problems 
that have been solved. However, many of these successes have been in application domains 
far removed from robotics. Additionally, humanoids have yet to perform tasks 
autonomously in human environments. This chapter formally defined one of the most 
important problems for humanoids, efficient, safe reaching in dynamic, unknown 
environments. Progress in the three areas critical areas to solving this problem- algorithms, 
perception for modeling, and execution-was surveyed and future prospects were examined. 
Table 1 summarizes the key issues in these areas as well as progress and prospects. 



Issue 


Progress and prospects 


Planning under uncertainty 


Difficult for humanoids, possible solution is dynamic 
replanning 


Incompleteness resulting 
from multiple IK solutions 


Solved by (Bertram et al., 2006) and (Drumwright & Ng- 
Thow-Hing, 2006) 


Planning to nonstationary 
goals 


Has received little focus, possible solution is modification 
of existing plans using inverse kinematics 


Planning in dynamic 
environments 


Dynamic replanning and offline-online probabilistic 
roadmaps seem to be promising directions; effectiveness 
toward humanoid reaching needs to be assessed 


Representation of 
environment model 


Octree approach works well, high-level features might 
work better in the future 


Localization 


Highly active area of research; (Fox et al., 1999) indicates 
good localization feasible even in nonstationary 
environments 


Modeling nonstationary 
environments 


Considerable work remaining; current work fails to 
address movable obstacles and dynamic obstacles 
simultaneously 


Exploration 


Considerable work remaining; exploration for planning 
humanoid motions yet to be addressed 


Controller deviation 


Better controllers will address this problem; can be 
mitigated by assuming differential locomotion controller 


Execution monitoring 


(Brock & Khatib, 2002) is one possible solution; dynamic 
replanning is another 



Table 1. The core issues with 
brief summaries of the finding 



respect to humanoid reaching discussed in this chapter with 
;S. 
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With respect to algorithms, control, and execution monitoring, dynamic replanning seems 
to be one of the best solutions for humanoid reaching. As computing power continues to 
grow, dynamic replanning becomes increasingly viable for planning reaching in 
nonstationary environments. For modeling the robot's environment, computational power 
seems to be less of an issue. Instead, new algorithms are required to balance the tradeoff 
between exploration and exploitation and to perceive and classify fully dynamic and semi 
dynamic obstacles. These three areas critical to humanoid reaching are currently the focus of 
intensive research, and results applicable toward humanoid reaching are on the horizon. 
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1. Introduction 

Biped robots have potential ability to elevate mobility of robotic system and they attract 
general attention in the last decade. Due to their form, it is easy to introduce them into our 
living space without preparing special infrastructure. There are many theoretical and 
experimental studies on the biped robots. Recently, several autonomous biped humanoid 
robots have been developed. In 1996, the first autonomous bipedal humanoid robot with 
battery power supply was developed by Honda Motor Co., Ltd. (Hirai, et al., 1998). 
Takanishi and his co-workers at Waseda University built a human-size bipedal humanoid 
robot and proposed a basic control method of whole body cooperative dynamic biped 
walking (Yamaguchi et al., 1999). Same group also developed a biped robot having two 
Stewart platforms as legs (Sugahara, et al., 2003). Kaneko and his colleagues at the National 
Institute of Advanced Industrial Science and Technology (AIST) also developed a bipedal 
humanoid robot of 1.54m height and 58kg weight mainly for the purpose of investigating 
fundamental techniques and applications (Kaneko et al., 2004). Same group also proposed a 
method of a biped walking pattern generation by using a preview control of the zero- 
moment point(ZMP) (Kajita, et al., 2003). The ZMP and its extension (FRI; foot-rotation 
indicator) are basic stability criteria for biped robots (Vukobratovic, et al, 2001, Goswami, 
1999). Nishiwaki and his co-researchers at University of Tokyo studied a humanoid walking 
control system that generates body trajectories to follow a given desired motion online 
(Nishiwaki, et al. 2003). Loffler and his colleagues at Technical University of Munich also 
designed a biped robot to achieve a dynamically stable gait pattern (Loffler, et al., 2003). 
Generally, careful design is required for development of a bipedal robot. Selection of gears 
and actuators, taking their rate, power, torque, and weight into account, is especially 
important. Developments of power conversion technology based on semiconductor 
switching devices and rare-earth permanent magnet materials Nd-Fe-B in combination with 
optimal design of the electromagnetic field by the finite element method enable 
improvement of power/ weight ratio of actuators. They contribute to the realization of such 
autonomous biped robots. However, they are still underpowered to achieve fast walking 
and running motions as the same that human does. There are several researches on running 
control of biped model (Raibert, 1986, Hodgins, 1996, Kajita, et al., 2002). The first human- 
size running robot was developed at AIST (Nagasaki, et al., 2004). In December 2005, Honda 
Motor Co., Ltd. announced their new bipedal humanoid robot that could run at 6km/h. 
Kajita, et al. proposed a method to generate a running pattern and reported that it requires 
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at least 28 to 56 times more powerful actuators than that of their actual humanoid robot 
HRP1, and also the consumption power is estimated ten times bigger than the human 
runner (Kajita, et al., 2002) 

In this chapter, a method to generate a trajectory of a running motion with minimum energy 
consumption is introduced. It is useful to know the lower bound of the consumption energy 
when we design the biped robot and select actuators. The generation of low-energy 
trajectories for biped robots remained an open problem. Usually, it is formulated as an 
optimal control problem. Since symbolic expression of motion equation of robots becomes 
extremely complicated in the case that the number of links increases, only specific simple 
type of a structure of robots was investigated and simplified assumptions such as ignoring 
the effects of centripetal forces were made in the past works (Roussel, et al., 1998). In this 
chapter, exact and general formulation of optimal control for biped robots based on 
recursive representation of motion equation is introduced. As an illustrative numerical 
example, the method is applied to a five link planar biped robot of 1.2m height and 60kg 
weight. The robot with the generated minimum energy running pattern with lm/sec speed 
consumes only 45W on average. It is found that big peak power and torque is required for 
the knee joints but its consumption power is negative small and the main work is done by 
the hip joints. 

The rest of this chapter is organized as follows. In Section 2, the problem definition is 
introduced where the formulation of the biped running robot is given. The minimization of 
consumption energy is explained in the Section 3, and the computational scheme is 
proposed in the Section 4. The numerical study of a five link planar biped robot is provided 
in Section 5, and conclusions are outlined in the Section 6. 

2. Model of Biped Running Robot 

2.1 Basic Model of a Biped Robot 

Consider a three dimensional bipedal robot with open-chain mechanism consisting of N 
joints and N+l rigid links. Since the robot is not fixed on the ground, it is modelled as a free- 
fall manipulator which has N+6 motion-degree-of-freedom (Fujimoto & Kawamura, 1995, 
1998). The model in general form is given by 

H'x + Cx + g = u + J T f E (l) 

where H e 9? ' **""' **" is an inertia matrix, C e j{"** 6MA '+ 6 > specifies centrifugal and Coriolis 
effects, g g yi "«• specifies gravity effects, x = (0,<p,p) e 9? *"" specifies displacements of 
joints, and posture and position of a base link, u = («,0,0) e 5? "** specifies input generalized 
forces, f g <R 6 is external force and J e J}'* 1 **" is a Jacobian matrix for tip of the support 
leg. Also #,«e$R" specify joint angles and input joint torques, respectively. p,^e!R M 
specify posture and position of a base link, respectively. An example of the coordinates for 
the case of planner biped is shown in Fig. 1. 

2.2 Model of Support Phase 

The motion of running is decomposed to two phases; single-leg support phase and flight 
phase. 
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Fig. 1. A five link biped robot. 

Let the tip position of the support leg with respect to the origin of the base link system be 
h(0,<p) ■ Then its position with respect to the origin of the world coordinate system is 
represented by 

y = h{G,(p)+p (2) 

Since the foot of the support leg is fixed on the ground during single support phase, it is 
subject to the following conditions. 

y = J9 + R(p+p = (3) 

y = J6 + Je + R(p + R<p + p = (4) 

where J = dhl d0 T and R = dhl d(p T are Jacobian matrices. 

Eliminating p , p , and f from (1) using (3) and (4), the following dynamics is obtained. 



H,x, + C ,x„ 



(5) 



where x = (0,m) and u = (w,,0)- The subscript s represents variables during support 
phase. The state equation is given by 

H s [ (",-C s x s -g s ) 

where w s = (x s ,X s ) = (6 s ,(p s ,0 s ,<p s ) e y\ 2N+6 is a state vector. 



W, 



(6) 



2.3 Model of Flight Phase 

During flight phase, the external force f in the motion equation (1) is zero. The 

conservation law of angular momentum is already included equivalently, which 
corresponds to no existence of external forces. Therefore the dynamics becomes 

H f X f + C f X f + g f = U f (7) 

where v = (0 m p ) and u f = (« ,,0,0) • The subscript /represents variables during flight 

phase. The state equation is given by 



w. 



) 



H~/(u f -C f x f -g f) 
where Wf = (x f ,x f ) = (0 f ,<p f ,p,0 f ,<p f ,p)e 9? 2<jV+6) is a state vector. 



(8) 
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3. Generation of Minimum Energy Consumption Gait 

It is useful to know the lower bound of the consumption energy when we design the bipedal 
robot and select actuators. An ideal actuator is assumed in this chapter to investigate the 
consumption energy, although real actuators with high-ratio gears such as harmonic gears 
have large frictions and roughly 70% of energy efficiency. The energy regeneration is 
considered. The problem is to find input joint torques and initial posture that minimize 
input energy during running motion under the condition that the robot takes completely 
periodic and symmetric motion, given the step period and the stride. The problem is 
described as follow. 

(9) 



minimize 



f 



6'ndt 



subject to 



0(T) = K0(O) 
<p(T) = <p(0) 
' p(T)^p(0)+S 
Hx + Cx + g = u + J f E 
where T is a period for one step. S is the stride. K is a coordinate conversion matrix; 



(10) 



/„ 





0" 








/ 





/, 






(11) 



where / and / are identity matrices whose dimensions are same as number of joints in 

body and one leg, respectively. 

Since the structure of dynamics varies depending on the phase as shown in the previous 

section, a reflection of time axis is introduced. A new time axis is given by 



aT 

T-t 



for < / < aT 
for aT<t<T 



(12) 



[{1- a y 

The timing chart of events are shown in Fig. 2. All variables in the rest of this section are 

functions of r ■ 

flight support flight support 

phase phase phase phase 

taldng-off landing taldng-off landing taldng-off 







aT 



T 



1 

Fig. 2. Timing chart of events. 

The objective function (9) is represented by 

E= f 6 T ndt+(9 T ndt 

= | {a6[n s + (l - a)0 T f n f )rdz 



(13) 
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The state equations (6) for < t < aT and (8) for aT <t <T are transformed onto r -axis as 
follows. 

(14) 



dw s 

— - = aT 
dr 


x s 
H-\u s -C s x s -g s ) 




dw 

— - = (a 
dr 


-\)T 


HJ 1 \u f -C f x f 


Sf\ 



(15) 

State variables should include the support phase ratio a in order to find its optimal value as 
well. The support phase ratio is constant. Then, the following differential equation is 
introduced. 

da 



dr 







Finally, the problem (9) (10) is transformed into a Bolza problem, 
minimize 



^ = ^z 1 ) + |[/ (z(r),v(r)) + ^r[^-/(z(.),v(r))] 



dz 



(16) 



(17) 



where z = ( Wj , Wf ,a) = (x s , x s , x f , x f , a) = (6>, , <p s , 9 S , <p s , 9 f , <p f , p f , 9 f ,<p f ,p f ,a)e Vi 4N+l9 is a 
state vector, y = (w ,«,)e9? 2jV is a input vector, /le'JJ 4,v+19 is a Lagrange multiplier, 



z(l) is state at the terminal period, and 



f (z, v) = {aG*n s + (l - a)9 T f n f )t 

aTx s 

aTH; l (u s -C s x s ~g s ) 
f{z,v)= (a-l)Tx f 

{a - \)THf (u f - C f x f - g f ) 




(18) 



(19) 



The function g(z.) represents penalty for the terminal condition that is introduced to 
guarantee continuity of the state variable at the instant of the taking-off . 

g{z l ) = w{\9 s {\)-9 J {\^ +M(\)-0 f (\t + 1^(1)- P/(l)f 

V x (20) 

+ ||^(l)-^/(l)| +h(l)-P/(l| +|P,(1)-P/(1| J 
where W is an weight coefficient. The variables p and p are implicitly defined, which are 
represented by functions of w ■ Variation of the extended objective function E is given by 

! 'df a df , dA) T 



dz„ I dz, 



,)% + l(^::... , u 



dz dz 



dr ) 



dv dv (dr ' 



A 



(21) 



c/r 
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where 3 = A(0) and X, = A(l) ■ In this equation, we assume that the initial state z is a 

function of certain variables which consist of partial set of the state, namely, a part of the 
initial state is independent and the other depends on it. Let the independent initial state 
variables be z ' = (0 s (O),(p s (O),0 f (O),(p f (O),p f (O),a(O)) ■ The rest of the initial state are 
decided by 

f (o)=K0 s (o) 

<p f (0)=<p s (0) 

P f (o) = -h(0 s (o),<P s (o))+S 



(22) 
(23) 
(24) 



|W 




~K 0" 


>». 




0/0 



-H/Jj{J f H} 1 jJ) J f 



f (o)' 



(25) 



The first three equations are coordinate conversion at the instant of landing and the last is 
the condition of perfectly inelastic collision at the instant of landing. Let the impulsive 
external force at the foot of support leg be Sf ■ The impact force Sf is inflicted at the instant 

of the landing and the generalized velocity changes discontinuously. From (1), the 
generalized momentum after the collision is given by 

Hx + =Hx_+J T Sf (26) 

where x and x denote the generalized velocities after and before collision, respectively. 
J =[J,R,I] is an extended Jacobian. Since it is support phase after the collision, the 
condition (3) holds, namely, 

Jx + = (27) 

Describing (26) and (27) for % ar >d df , the following equation is obtained. 



H 
-J 



J' 




Sf 



m_ 

o 



(28) 



Eliminating Sf from (28), we have 

i + =\I-H- x J T {jH- l J T Yj)ic_ (29) 

Here, % corresponds to i.(0) which is the generalized velocity of the support phase at 
T = , and x_ corresponds to x , (0) which is the generalized velocity of the flight phase at 

T = . Taking into account the coordinate conversion between left and right leg, (29) is 
transformed into the form of (25). 

From (21), the following conditions are obtained. 

dg (30) 

dz. 



K 



dX 
dz 
dz 
~dr~ 



df df 7 



dz 
f 



dz 



i 



(31) 



(32) 
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Also the gradients are given by 



BE 

dz' 



8z » 1 

8z n 



(33) 



8E 8f 8f 



— --^-^—X (34) 

dv dv dv 

To find the optimal solution, the conjugate gradient method in infinite dimensional 
space (Hilbert space) is applied to this problem. The procedures of the algorithm are as 
follows. 

1) The initial solution (z„,v(r)) is given. 

2) The initial state z n is computed by (22)-(25). 

3) The differential equation (32) is solved using z as the initial condition. 

4) Xy is computed by (30) using the final value z, • 

5) The differential equation (31) is backwardly solved using z, ■ 

6) The gradients for z ' and v(r) are computed by (33)(34) using z(r), A(t), and 

V(T). 

7) The temporary solution (zj.,v(r)) is updated toward the direction of the conjugate 

gradient. 

8) If the gradient is not small enough, return to 2. 

Finally, the input joint torques n(t), the joint angles 0{t) , the posture and position (of the 
base link) q>(t) , pit) , their derivatives 0{t) , <p(t) , and the support phase ratio a are 
obtained. A general method to compute the partial derivatives in (30)-(34) is proposed in the 
next section. 

4. Computational Scheme for Partial Derivative 

It is difficult to calculate the partial derivatives in (30)-(34) symbolically, because basically it 
costs very much to obtain a symbolic expression of the equation of motion (1). In this section, 
a computational scheme for the partial derivatives based on numerical representation of 
motion equation is proposed. Finally we can compute them easily by using forward- 
backward recursive Newton-Eular formulation. 
Each partial derivative appeared in (30)-(34) is represented as follows. 
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The partial derivatives appeared in (38), (39), (44), (48), and (49) are computed by using 
modified Newton-Euler formulations. 

5. Numerical Study of Five-link Planar Biped 

The proposed method is applied to a five-link planar biped robot. The specification of the 
robot and the control parameters are shown in Table 1. The robot is 1.2 [m] height and 60 
[kg] weight. The coordinates are taken as shown in Fig. 1. 

The optimal trajectories are computed as shown in Fig. 3-Fig. 6. Snapshots of the 
running motion are also shown in Fig. 7. The solid lines and the dashed lines show the 
trajectories for the right leg and the left leg, respectively. In Fig. 4, there are some 
discontinuous velocities due to the impact force at the instant of the landing. In Fig. 5, 
the peak torque for the hip joints appears at the beginnings of the swinging. For the 
knee joints, the torque becomes maximal at the instant of the landing and keeps over 
lOON.m. In Fig. 6, the positive peak power for the hip and knee joints appears during 
kicking motion. For the hip joints, the power becomes negative peak value at the 
beginning of the flight phase. This means that the hip joints absorb the energy of the 
kicking motion. The negative peak power for the knee joints appears at the instant of 
the landing. Namely, knee joints absorb the impact power between the foot and the 
ground. 



body length and weight 


0.6m, 20kg 


thigh length and weight 


0.3m, 10kg 


shin length and weight 


0.3m, 10kg 


total height & weight 


1.2m, 60kg 


stride S 


0.5m 


period of one step T 


0.5s 


running speed T/S 


lm/s 



Table 1. Specifications of robot and control parameters. 






hip 


knee 


peak angular velocity [rad/s] 


14.5 


11.6 


peak torque [N.m] 


48.2 


117.6 


peak power (positive) [W] 


355 


1265 


peak power (negative) [W] 


-699 


-849 


consumption power [W] 


27.9 


-5.37 


total consumption power [W] 


45.1 



Table 2. Actuator requirement. 



Table 2 shows requirements for the actuators based on this result. It is found that very big 
power is required for knee joints. However, its total consumption power has small negative 
value. Therefore, the main work is done by the hip joints. Since the negative power is also 



Minimum Energy Trajectory Planning for Biped Robots 



237 



big, for real robots, the introduction of the energy regeneration mechanism such as elastic 
actuators or combination of high back-drivable actuators and bidirectional power converters 
is effective to reduce the total consumption power. 
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(a) hip joint 
Fig. 3. Joint angles (solid line: right leg, dashed line: left leg). 
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(b) knee joint. 
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Fig. 4. Angular velocities of joints (solid line: right leg, dashed line: left leg). 
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Fig. 5. Joint torques (solid line: right leg, dashed line: left leg). 
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Fig. 6. Joint powers (solid line: right leg, dashed line: left leg). 
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Fig. 7. Snapshots of running trajectory. 
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6. Conclusion 

In this chapter, the method to generate a trajectory of a running motion with minimum 
energy consumption is proposed. It is useful to know the lower bound of the consumption 
energy when we design the bipedal robot and select actuators. The exact and general 
formulation of optimal control for biped robots based on numerical representation of 
motion equation is proposed to solve exactly the minimum energy consumption trajectories. 
Through the numerical study of a five link planar biped robot, it is found that big peak 
power and torque is required for the knee joints but its consumption power is small and the 
main work is done by the hip joints. 
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1. Introduction 

Robust real-time stereo facial feature tracking is one of the important research topics for a 
variety multimodal Human-Computer, and human robot Interface applications, including 
telepresence, face recognition, multimodal voice recognition, and perceptual user interfaces 
(Moghaddam et al., 1996; Moghaddam et al., 1998; Yehia et al, 1988). Since the motion of a 
person's facial features and the direction of the gaze is largely related to person's intention 
and attention, detection of such motions with their 3D real measurement values can be 
utilized as a natural way of communication for human robot interaction. For example, 
addition of visual speech information to robot's speech recognizer unit clearly meets at least 
two practicable criteria: It mimics human visual perception of speech recognition, and it 
may contain information that is not always present in the acoustic domain (Gurbuz et al., 
2001). Another application example is enhancing the social interaction between humans and 
humanoid agents with robots learning human-like mouth movements from human trainers 
during speech (Gurbuz et al., 2004; Gurbuz et al., 2005). 

The motivation of this research is to develop an algorithm to track the facial features using 
stereo vision system in real world conditions without using prior training data. We also 
demonstrate the stereo tracking system through a human to humanoid robot mouth 
mimicking task. Videre stereo vision hardware and SVS software system are used for 
implementing the algorithm. 

This work is organized as follows. In section 2, related earlier works are described. Section 3 
discusses face RIO localization. Section 4 presents the 2D lip contour tracking and its 
extention to 3D. Experimental results and discussions are presented in Section 5. Conclusion 
is given in Section 6. Finally, future extention is described in Section 7. 

2. Related Work 

Most previous approaches to facial feature tracking utilize skin tone based segmentation 
from single camera exclusively (Yang & Waibel, 1996; Wu et al., 1999; Hsu et al., 2002; 
Terrillon & Akamatsu, 1999; Chai & Ngan, 1999). However, color information is very 
sensitive to lighting conditions, and it is very difficult to adapt the skin tone model to a 
dynamically changing environment in real-time. 
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Kawato and Tetsutani (2004) proposed a mono camera based eye tracking technique based 
on six-segmented filter (SSR) which operates on integral images (Viola & Jones, 2001). 
Support vector machine (SVM) classification is employed to verify pattern between the eyes 
passed from the SSR filter. This approach is very attractive and fast. However, it doesn't 
benefit from stereo depth information. Also SVM verification fails when the eyebrows are 
covered by the hair or when the lighting conditions are significantly different than the SVM 
training conditions. 

Newman et al., (2000) and Matsumoto et al., (19990) proposed to use 3D model fitting 
technique based on virtual spring for 3D facial feature tracking. In the 3D feature tracking 
stage each facial feature is assumed to have a small motion between the current frame and 
the previous one, and the 2D position in the previous frame is utilized to determine the 
search area in the current frame. The feature images stored in the 3D facial model are used 
as templates, and the right image is used as a search area firstly. Then this matched image in 
2D feature tracking is used as a template in left image. Thus, as a result, 3D coordinates of 
each facial feature are calculated. This approach requires 3D facial model beforehand. For 
example, error in selection of a 3D facial model for the user may cause inaccurate tracking 
results. 

Russakoff and Herman (2000) proposed to use stereo vision system for foreground and 
background segmentation for head tracking. Then, they fit a torso model to the segmented 
foreground data at each image frame. In this approach, background needs to be modeled 
first, and then the algorithm selects the largest connected component in the foreground for 
head tracking. 

Although all approaches reported success under broad conditions, the prior knowledge 
about the user model or requirement of modeling the background creates disadvantage for 
many practical usages. The proposed work extends these efforts to a universal 3D facial 
feature tracking system by adopting the six-segmented filter approach Kawato and 
Tetsutani (2004) for locating the eye candidates in the left image and utilizing the stereo 
information for verification. The 3D measurements data from the stereo system allows 
verifying universal properties of the facial features such as convex curvature shape of the 
nose explicitly while such information is not present in the 2D image data directly. Thus, 
stereo tracking not only makes tracking possible in 3D, but also makes tracking more robust. 
We will also describe an online lip color learning algorithm which doesn't require prior 
knowledge about the user for mouth outer contour tracking in 3D. 

3. Face ROI Localization 

In general, face tracking approaches are either image based or direct feature search based 
methods. Image based (top-down) approaches utilize statistical models of skin color pixels 
to find the face region first, accordingly pre-stored face templates or feature search 
algorithms are used to match the candidate face regions as in Chiang et al. (2003). Feature 
based approaches use specialized filters directly such as templates or Gabor filter of 
different frequencies and orientations to locate the facial features. 

Our work falls into the latter category. That is, first we find the eye candidate locations 
employing the integral image technique and the six segmented rectangular filter (SSR) 
method with SVM. Then, the similarities of all eye candidates are verified using the stereo 
system. The convex curvature shape of the nose and first and second derivatives around the 
nose tip are utilized for the verification. The nose tip is then utilized as a reference for the 
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selection of the mouth ROI. At the current implementation, the system tracks the person 
closest to the camera only, but it can be easily extended to a multiple face tracking 
algorithm. 

3.1 Eye Tracking 

The pattern of the between the eyes are detected and tracked with updated pattern matching. 
To cope with scales of faces, various scale down images are considered for the detection, and 
an appropriate scale is selected according to the distance between the eyes (Kawato and 
Tetsutani, 2004). The algorithm calculates the intermediate representation of the input image 
called "Integral image", described in Viola & Jones (2001). Then, a SSR filter is used for fast 
filtering of bright-dark relations of the eye region in the image. Resulting face candidates 
around the eyes are further verified by perpendicular relationship of nose curvature shape as 
well as the physical distance between the eyes, and eye level and nose tip. 

3.2 Nose Bridge and Nose Tip Tracking 

The human nose has a convex curvature shape and the ridge of the nose from the eye level 
to the tip of the nose lies on a line as depicted in Fig. 1. Our system utilizes the information 
in the integral intensity profile of convex curvature shape. The peak of the profile of a 
segment that satisfies Eqn. 1 using the filter shown in Fig.2 is the convex hull point. A 
convolution filter with three segments traces the ridge with the center segment greater than 
the side segments, and the sum of the intensities in all three segments gives a maximum 
value on the convex hull point. Fig.2 shows an example filter with three segments that traces 
the convex hull pattern starting from the eye line. The criteria for finding the convex hull 
point on an integral intensity profile of a row segment is as follows, 

Si < $2 < 5 ;{ mid arg{max(5i + 352 + 53)}, (1) 

3 

where Si denotes the integral value of the intensity of a segment in the maximum filter shown 

in Fig. 2, and j is the center location of the filter in the current integral intensity profile. The 

filter is convolved with the integral intensity profile of every row segment. A row segment 

typically extends over 5 to 10 rows of the face ROI image, and a face ROI image typically 

contains 20 row segments. Integral intensity profiles of row segments are processed to find 

their hull points (see Fig.l using Equation 1 until either the end of the face ROI is reached or 

until Eqn. 1 is no longer satisfied. For the refinement process, we found that the first derivative 

of the 3D surface data as well as the first derivative of the intensity at the nose tip are 

maximum, and the second derivative is zero at the nostril level (Gurbuz etal., 2004a). 



Convex hull pi) i 111 — *% 




Nose tip 
Fig. 1. Nose bridge line using its convex hull points from integral intensity projections. 
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t width = 1 



Fig. 2. A three-segment filter for nose bridge tracing. 

4. Lip Tracking 

The nose tip location is then utilized for the initial mouth ROI selection. Human mouth has 
dynamic behavior and even dynamic colors as well as presence or absence of tongue and 
teeth. Therefore, at this stage, maximum-likelihood estimation of class conditional densities 
for subsets of lip (wi) and non-lip (W2) classes are formed in real-time for the Bayes decision 
rule from the left camera image. That is, multivariate class conditional Gaussian density 
parameters are estimated for every image frame using an unsupervised maximum- 
likelihood estimation method. 



4.1 Online Learning and Extraction of Lip and Non-lip Data Samples 

In order to alleviate the influence of ambient lighting on the sample class data, 
chromatic color transformation is adopted for color representation (Chiang et al., 2003; 
Yang et al., 1998). It was pointed out (Yang et al., 1998) that human skin colors are less 
variant in the chromatic color space than the RGB color space. Although in general the 
skin-color distribution of each individual may be modeled by a multivariate normal 
distribution, the parameters of the distribution for different people and different 
lighting conditions are significantly different. Therefore, online learning and sample 
data extraction are important keys for handling different skin-tone colors and lighting 
changes. To solve these two issues, the authors proposed an adaptation approach to 
transform the previous developed color model into the new environment by 
combination of known parameters from the previous frames. This approach has two 
drawbacks in general. First, it requires an initial model to start, and second, it may fail 
in the case of a different user with completely different skin-tone color starts using the 
system. 

We propose an online learning approach to extract sample data for lip and non-lip classes to 
estimate their distribution in real time. Chiang et al. (2003) in their work provides hints for 
this approach. They pointed out that lip colors are distributed at the lower range of green 
channel in the (r,g) plane. Fig. 4 shows an example distribution of lip and non-lip colors in 
the normalized (r,g) space. 

Utilizing the nose tip, time dependent (r,g) spaces for lip and non-lip are estimated for 
every fame by allowing £ % (typically 10%) of the non-lip points stay within the lip (r,g) 
space as shown in Fig. 4. Then, using the obtained (r,g) space information in the initial 
classification, the pixels below the nostril line that falls within the lip space are considered 
as lip pixels, and the other pixels are considered as non-lip pixels in the sample data set 
extraction process, and RGB color values of pixels are stored as class attributes, 
respectively. 
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Fig. 3. Left image: result of the Bayes decision rule, its vertical projection (bottom) and 
integral projection of intensity plane between nose and chin (right). Middle image: 
estimated outer lip contour using the result of the Bayes rule. Right image: a parameterized 
outer lip contour. 
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Fig. 4. Dynamically defined lip and non-lip (r,g) spaces. 

In most cases, sample data contains high variance and it is preferable to separate the data 
into subsets according to its time dependent intensity average. Let avgL and D/ ( be the 
intensity average and k th subset of the lip class, respectively. The subsets of the lip class are 
separated according to lip class 1 intensity average as 



assign to A if x int en S ity < avg L , 

assign to D 2 if avgiJ2 < Xi„t ensi t v < 3avgt/2, 

assign to D 3 if Xi nt ensitv > av 9L- 



(2) 



246 



Humanoid Robots, New Developments 



Using the same concept in Eqn. 2, we also separate the non-lip data samples into subsets 
according to intensity average of the non-lip class. Fig. 5 depicts simplified conditional 
density plots in ID for the subsets of an assumed non-lip class. 

P(x I nonLip) 




avgnon Lip/2 avis nonLip 3av«nonLip/2 
Fig. 5. Example class conditional densities for subsets of non-lip class. 

4.2 Maximum-Likelihood Estimation of Class Conditional Multivariate Normal 
Densities 

The mean vector and covariance matrix are the sufficient statistics to completely describe a 
distribution of the normal density. We utilize a maximum-likelihood estimation method for 
the estimation of a class conditional multivariate normal density described by 



1 1 _, T 

1 v/(27r)"||£,-|| l 2 l V ! 



(3) 



where 1 may be Wi, or Wj, or subset of a class, u. - E[x] is the mean value of the i th class. 
^. is the n x n (in this work, n is number of color attributes so n = 3) covariance matrix 
defined as 

Xi = E[(x-tH)(x-pa) T ] (4) 

where || • || represents the determinant operation, and E[] represents the expected value of a 
random variable. Unbiased estimates of the parameters //.and £ are estimated by using 
the sample mean and sample covariance matrix. 



4.3 Bayes Decision Rule 

Let x be an observation vector formed from RGB attributes of a pixel location in an image 
frame. Our goal is to design a Bayes classifier to determine whether x belongs to wi or 102 in 
two class classification problem. The Bayes test using a posteriori probabilities may be 
written as follows: 

p(Wi\x) § p(w 2 \x), (5) 

w 2 
where p(w- | x) is the a posteriori probability of W ( - given X . Equation (5) shows that if the 

probability of W l given X is larger than the probability of W 2 , then X is declared belonging 
to w / and vice versa. Since direct calculation of p( w . I x) is not practical, we can re-write the 
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a posteriori probability of W- using Bayes 1 Theorem in terms of a priori probability and the 
conditional density function p(x\ vf;) , as 

, . . p(x\Wi)p(Wi) 

P(W ' |X) = p(x) < 6 > 

where p(x) is the density function and is positive constant for all classes. Then, re-arranging 
both sides, we obtain 

_ p(x| Wl ) 5 pw 

MXJ p(xh)< P K) (7) 

where Z,(x) is called the likelihood ratio, and p(w,)/ p(w 2 ) is called the threshold value} of 
the likelihood ratio for the decision. Because of the exponential form of the densities 
involved in Equation (7), it is preferable to work with the monotonic discriminant functions 
obtained by taking the logarithm as follows. 

(/,,,, (x) = ln(p(x\wi)p(wi)). ^ 

thus, by re-arranging Equation (8), we get 

9m, (x) = -i(x - ih&THx - tnf + <k (9) 

Where 

c ; =hip(w;)-(l/2)ln27r-(l/2)||E;|| 

is a constant for this image frame. In general, Equation (9) has only nonlinear quadratic 
form and a summation, and using this equation, the Bayes rule can be implemented for real- 
time lip tracking as follows. 

CW I C( x ). ( 10 ) 

Where 

(/*(x) = max{<7, (x),g; (x).<7.| (x)} 
for i = {w, ,w 2 }and referring to Fig. 5. Threshold value of the likelihood ratio as shown in 
Eqn. (7) is based on a priori class probabilities. In our implementation, equally likely a priori 
class probabilities are assumed. 

4.4 Mouth Outer Contour Parameterization in 2D 

After mouth tracking algorithm locates the mouth region, outer lip contours of the speaker's 
lips in left camera image are detected (see Fig. 3). Then, the outer contour as a whole is 
parameterized by a generalized ellipse shape which is obtained using the estimated outer 
contour data. A parametric contour is found that corresponds to the general quadratic 
equation a x x 2 + a 2 xy + a 3 y 2 + a 4 x + a 5 y + a 6 = , where a. s are constants, and a x and 
a 3 are non-zero. Let us denote the 2D positions over the traced outer lip contour as 



(10) 



Xi X'2 Xj ... Xtf 

m m V3 ... vn_ 

The basic form used in the elliptical parameter estimation in matrix notation is Ma = 
where a = [a\ ■ ■ ■ ag ] The dimensionality of M is the number of points, N, in the segment 
multiplied by 6 (that is, N x 6). Each row of M corresponds to one point in the segment. The 
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parameters of each contour are then solved using the least-squares method to find (X t s, 

where ; = l,2,---,6 . 

Using the estimated parameters, parametric lip contour data can be re-generated for each 
image frame. Five points are sufficient to represent a general elliptical shape, leading to a 
significant data reduction and representation. 




Fig. 6. Screen capture of tracked outer lip contours for various skin tone colors and different 
lighting conditions. 



4.5 Estimation of 3D Mouth Outer Contour 

Once the outer lip contour points of the speaker's lips in left camera image are found then 
their stereo disparity values from the right image can be calculated utilizing the previously 
found horopter information. Fig. 7 shows stereo and disparity images. Knowing a pixel 
location (x,y) in the left camera image and its disparity in the right camera image, we can 
calculate its 3D (X,Y,Z) coordinates with respect to the camera coordinate system as shown 
in Fig. 8. 




Fig. 7. Screen capture of the left and right camera images, and their disparity map around 
the face region of interest. 
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Fig. 8. Screen capture of the left and right camera images, and OpenGL plot of texture 
mapped 3D face reconstructed by the stereo algorithm. 
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5. Experimental Results and Discussion 

In this paper, our work focused on a real-time stereo facial feature tracking algorithm. 
Intensity information is used for initial eye candidates in the left image. Relationship of the 
eyes and the nose are verified using 3D data. Then, the nose tip is utilized as a reference 
point for mouth ROI. RGB color attributes of lip pixels are used for the lip tracking. 
The proposed stereo facial feature tracking algorithm has been tested on live data from 
various users without using any special markers or paintings. Fig. 6 shows tracked lip 
contour results for various users under various lighting conditions. The stereo facial feature 
tracking algorithm which utilizes Videre stereo hardware works around 20 frames per 
second (un-optimized) on a 2 GHz notebook PC under Windows platform. We also 
demonstrate the developed stereo tracking system through a human to humanoid robot 
mouth mimicking task. The humanoid head (Infanoid) is controlled via serial 
communication connected to a PC. Commands to the mouth is send every 50 ms (20 hz), at 
the same rate as the processing of facial features tracking - thus, allowing real-time 
interaction. The communication between the vision system and the control PC is via a 1Gbit 
Ethernet network. The opening and closing of the person's mouth is directly mapped to the 
mouth of the humanoid's mouth, with a simple geometric transform. The Infanoid robot 
head is shown in Fig. 10 which was developed by Kozima (2000). 

6. Conclusions 

A new method for stereo facial feature tracking of individuals in real world conditions is 
described. Specifically, stereo face verification and an unsupervised online parameter 
estimation algorithm for the Bayesian rule is proposed. That is, a speaker's lip colors are 
learned from the current image frame using the nose tip as a reference point. Vertical and 
horizontal integral projections are utilized to guide the algorithm in picking out the correct 
lip contour. In the final stage, estimated outer contour data for every image frame of the left 
camera is parameterized as a generalized ellipse. Then, utilizing the contour pixel locations 
in the left camera image and their disparity of the right camera image, we calculate their 3D 
(X,Y,Z) coordinates with respect to the camera coordinate system. Future work for vision 
includes extraction of 3D coordinates of other facial points such as eyebrows, chin and 
cheek, and further extending the work to the multiple face tracking algorithm. 

7. Humanoid Robotics Future Extension 

Future extensions of this work include developing a machine learning method for smooth 
mouth movement behavior to enable humanoids to learn visual articulatory motor tapes for 
any language with minimal human intervention. Fig. 9 shows the flow diagram of the system. 
Such a system should extract and store motor tapes by analyzing a human speaker's audio- 
visual speech data recorded from predetermined phonetically-balanced spoken text to 
create a mapping between the sound units and the time series of the mouth movement 
parameters representing the mouth movement trajectories. These motor tapes can then be 
executed with the same time index of the audio, yielding biologically valid mouth 
movements during audio synthesis. 

We call the system the text-to-visual speech (TTVS) syntliesis system. It can be combined with a 
concatinative speech synthesis system, such as Festival (Black & Taylor, 1997; Sethy & Narayanan, 
2002; Chang et al., 2000) to create a text-to-audiovisual speech synthesis system for humanoids. 
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Fig. 9. Future extension to a TTS based speech articulation system for Humanoids. 




Fig. 10. Infanoid robot utilized for human to humanoid robot mouth imitation task. 



Real-time Vision Based Mouth Tracking and Parameterization for a Humanoid Imitation Task 251 

A concatenative synthesis system creates indexed waveforms by concatenating parts 
(diphones) of natural speech recorded from humans. Using the same concatenative 
synthesis concept, the proposed TTVS system can concatenate corresponding mouth 
movement primitives. Thus, the system is capable of generating sequences of entirely novel 
visual speech parameters that represent the mouth movement trajectories of the spoken text. 
A humanoid agent equipped with TTS and TTVS systems can produce novel utterances, 
and so is not limited to those recorded in the original audio-visual speech corpus. With 
these capabilities, the humanoid robot can robustly emulate a person's audiovisual speech. 
A detailed explanation of this extension is described in Gurbuz et al. (2004b). Also, we will 
extend the work to include imitation of other facial movements, as the vision system 
expands to stereo and track additional features such as eyebrows, and perform perception 
studies to ascertain the effect of more accurate speech and face movement cues on 
naturalness and perceptibility in humanoids. 
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1. Introduction 



Controlling of a biped walking mechanism is a very challenging multivariable problem, the 
system being highly nonlinear, high-dimensional, and inherently unstable. In almost any 
realistic case the exact dynamic equations of a walking robot are too complicated to be 
utilized in the control solutions, or even impossible to write in closed form. 
Data-based modelling methods try to form a model of the system using only observation 
data collected from the system inputs and outputs. Traditionally, the data oriented methods 
are used to construct a global black-box model of the system explaining the whole sample 
data within one single function structure. Feedforward neural networks, as presented in 
(Haykin, 1999), for example, typically map the input to the output with very complicated 
and multilayered grid of neurons and the analysis of the whole net is hardly possible. Local 
learning methods (Atkeson et ah, 1997), on the other hand, offer a more structured approach 
to the problem. The overall mapping is formed using several local models, which have a 
simple internal structure but are alone valid only in small regions of the input-output space. 
Typically, the local models used are linear, which ensures the scalability of the model 
structure: Simple systems can be modelled, as well as more complex ones, using the same 
structure, only the number of the local models varies. 

In robotics, local modelling has been used quite successfully to form inverse dynamics 
or kinematic mappings that have then been applied as a part of the actual controller 
(Vijayakumar et ah, 2002). However, when trying to cover the whole high-dimensional 
input-output space, the number of local models increases rapidly. Additionally, 
external reference signals are needed for the controller to get the system function as 
desired. 

To evaluate the assumption of simple local models, a feedback structure based on linear 
local models, clustered regression, is used here to implement the gait of a biped walking 
robot model. The local models are based on principal component analysis (see Basilevsky, 
1994) of the local data. Instead of mapping the complete inverse dynamics of the biped, 
only one gait trajectory is considered here. This means that the walking behaviour is 
stored in the model structure. Given the current state of the system, the model output 
estimate is directly used as the next control signal value and no additional control 
solutions or reference signals are needed. The walking cycle can become automated, so that 
no higher-level control is needed. 
This text summarizes and extends the presentation in (Haavisto & Hyotyniemi, 2005). 



254 



Humanoid Robots, New Developments 



2. Biped model 

2.1 Structure of the mechanism 

The biped model used in the simulations is a two-dimensional, five-link system which has a 
torso and two identical legs with knees. To ensure the possibility of mathematical 
simulation, the biped model was chosen to be quite simple. It can, however, describe the 
walking motion rather well and is therefore applied in slightly different forms by many 
researchers (see e.g. Chevallereau et al., 2003; Hardt et al., 1999; Juang, 2000). 
Fig. 1 shows the biped and the coordinates used. Each link of the robot is assumed to be a 
rigid body with uniformly distributed mass so that the centre of mass is located in the 
middle of the link. The interaction with the walking surface is modelled by adding external 
forces F to the leg tips. As the leg touches the ground, the corresponding forces are switched 
on to support the leg. The actual control of the biped gait is carried out using the joint 
moments M, which are applied to both thighs and knees. 



(**>») 





Fig. 1. The coordinates used (left) and the external forces (right). 

As such, the system has seven degrees of freedom, and seven coordinates must be chosen to 
describe the configuration of the biped. The coordinate vector now is 

q = (x y a P L P R y L r R f ■ (1) 

Here the subindices L and R refer to the "left" and "right" leg of the biped, respectively. The 
dynamic equations of the system can be derived using Lagrangian mechanics (Wells, 1967) 
and the result has the following form: 

A(q)q = b(q,q,M,F), (2) 

where the "dotted variables" denote time derivatives of the coordinate vector. The exact 
formulas of the inertia matrix A(q) and the right hand side vector are quite complex and 
therefore omitted here (see Haavisto & Hyotyniemi, 2004). The minimum dimension n of 
the state vector representing the system state is 14 (the 7 coordinates and their time 
derivatives); however, because of the efficient compression carried out by the proposed 
data-based modelling technique, the dimension of the "extended state" could be higher, 
including measurements that are not necessarily of any use in modelling (redundancy 
among the measurements being utilized for elimination of noise). 
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2.2 Simulator 

To simulate the biped walking on a surface, a Matlab/Simulink (MathWorks, 1994-2007) 
block was developed. It handles internally the calculation of the support forces F using 
separate PD controllers for each force component: The force opposing the movement is 
the stronger the farther the foot tip has gone beneath the ground surface (P term), and the 
faster the foot tip is moving downward (D term). By adjusting these controller 
parameters, different kinds of ground surface properties can be simulated (very hard 
surfaces, however, resulting in high force peaks, so that short simulation step sizes have 
to be applied). Also the knee angles are automatically limited to a range defined by the 
user. To obtain this, the simulation block adds an additional moment to the knee joints if 
the angle is exceeding or going under the given limits. This limitation can be used for 
example to prevent the knee turning to the "wrong" direction, that is, to keep the angles 
7L and y& positive. 
The input of the simulation block is the four dimensional moment vector 

M = (M U M R1 M L2 M R2 f, (3) 

and the output the 14-dimensional state of the system augmented with leg tip touch sensor 
values sl and sr is 



'kJ- (4) 



If the leg is touching the ground, the sensor value equals to 1, otherwise 0. 
The biped dynamics, ground support forces and knee angle limiters are simulated in 
continuous time, but the input and output of the block are discretized using zero order hold. 
This allows the use of discrete-time control methods. A more detailed description of the 
used biped model and simulation tool is presented in the documentation (Haavisto & 
Hyotyniemi, 2004). 

2.3 Data collection 

In general, the gait movement of a biped walker can be divided into two phases 
according to the number of legs touching the ground. In the double support phase (DSP) 
both legs are in contact with the ground and the weight is moving from the rear leg to 
the front leg. In the single support phase (SSP) only the stance leg is touching the ground, 
while the swing leg is swinging forward. Because the biped is symmetrical regarding the 
left and right leg, the whole gait can be described with one DSP and one SSP. In every 
other step the stance and swing leg signals are switched to model the left and right leg 
swing in turns. This means that it is necessary to model only one DSP and one SSP with, 
say, left leg acting as the rearmost or swing leg and right leg as the foremost or stance 
leg. Accordingly, the data collected during this work were transformed to cover only 
these DSP and SSP cases. 

In order to collect sample data from the gait of the walking biped, a rough PD control 
scheme was developed. It was based on predetermined reference signals and four separate 
and discrete PD controllers, which produced the moment inputs for the biped. Fig. 2 shows 
the closed loop simulation model of the controlled biped. The PD controller block includes 
the separate PD controllers, whereas the Biped model block simulates the actual walking 
system and the surface interaction. 
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Fig. 2. The sample data were produced by simulating the PD controlled biped 

The PD controller was able to control the biped so that a cyclic gait movement was attained. 
To reach more varying data, independent white noise was added to each moment signal 
component. The input and output data of the biped system were collected of a 40 seconds 
walk, during which the biped took over 40 steps. The sampling interval was equal to the 
discretization period h = 0.005 sec. 

The data were mean-zeroed and scaled to unit variance in order to make the different 
components equally significant. Also the xrj coordinate of the state vector was omitted 
because it is monotonically increasing during the walk instead of being cyclic. As a result, 
the modelling data had 8000 samples of the 15 dimensional (reduced) state vector and the 
corresponding four dimensional moment vectors. The points were located around a 
nominal trajectory in the state and moment space. 

In the following, the state vector at time kh is denoted by u(k) and the moment vector by y(k) 
since these are the input and the output of the controller. Additionally, the state vector 
without the touch sensor values is denoted by u(k) . 

3. Clustered Regression 

The data-based model structure, clustered regression, that is applied here is formed purely by 
the statistical properties of the sample data. The main idea is to divide the data into clusters 
and set an operating point in the centre of each cluster. Every operating point, additionally, 
has its own local linear regression model which determines the local input-output mapping 
in the cluster region. A scheme of the model structure is presented in Fig. 3. 




Fig. 3. The local models cover the whole data along the trajectory 



3.1 Local model structure 

Each local model calculates a linear principal component regression (PCR) estimate of the given 
regression input. Principal components show the orthogonal directions of the highest 
variances in the input data. It is assumed here that variance is carrying information and the 
most important principal components are therefore relevant in the data, whereas the smaller 
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ones can be omitted as noise. The idea of the PCR is first to map the input data to the lower 
dimensional principal component subspace and then use multivariable linear regression to 
get the output estimate. 

Let us denote the input signal for the local model p at the time instant kh as uP(k), and the 
corresponding output as yP(k). When the input data are mapped onto the principal 
components, one gets the latent variable signal xv(k). To simplify the regression structure the 
latent variable data are locally scaled to unit variance before the regression mapping. 
Now the whole regression structure of the local model p can be stored in the following 
statistics, which are calculated using the data in the corresponding cluster: 
CI Expectation value of the input vector up. 

C Expectation value of the output vector yP. 

P^. Inverse of the latent variable data xP covariance matrix. 

R^ Cross-covariance of the latent and reduced input data. 

R*L Cross-covariance of the ouput and latent variable data. 

R^ u Covariance of the input data. 

Note that in the following the statistics are calculated using the collected data, which gives 
merely the estimates of the real quantities. It is assumed, however, that the estimates are 
accurate enough and no difference is made between them and the real values. 
The regression calculation itself is based on a Hebbian and anti-Hebbian learning structure 
(see 6.1). Assuming that the statistics correspond to the real properties of the data, the 
output estimate of the model p given an arbitrary input vector can be expressed in 
mathematical terms as 

y" (*) = R£ (PI f R; Cl («(*) -C>) + C; . (5) 

Here C? is the expectation value of the reduced state vector u p . In (5) the input is first 
transformed to the operating point p centred coordinates by removing the input mean value; 
then the estimate is calculated and the result is shifted back to the original coordinates in the 
output space by adding the output mean vector. 

A cost value for the estimate made in the unit p should be evaluated to measure the error of 
the result. The cost can depend for example on the distance between the cluster centre and 
the estimation point: 

r («(*)) = !(«(*) - c; f h" («(*) - a ) , (6) 

where HP is a constant weighting matrix. 

3.2 Combination of the local models 

The overall estimate of the clustered regression structure is calculated as a weighted average 
of all the local models. Assuming that the number of the operating points is N, one has 



£!*'(*)*'(*) 



m=^- N . (7) 
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The weights should naturally be chosen so that the currently best matching local models affect 
the final results the most, whereas the further ones are practically neglected. Let us choose 

UKj\ (8) 



H p 



that is, the weighting matrix for each local model cost calculation equals the scaled inverse 
of the input data covariance matrix, a being the scaling factor; assuming that the data in 
each local model are normally distributed, the maximum likelihood estimate for the combined 
output value is obtained when the weights in (7) are chosen as 

K"(k) = -^= [ = exp(-J"(u(k))), (9) 



M" (n p ) 



and cr=l. The simulations, however, showed that a more robust walking behaviour is 
reached with larger scaling parameter values. This increases the effect of averaging in the 
combination of the local model estimates. 
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Fig. 4. Clustered teaching data and the operating points 



4. Clustered Regression Control 

4.1 Teaching the model 

The clustered regression model was formed using the sample data collected from the PD 
controlled gait. The model input was the state vector u(k) and output the corresponding 
control signal y(k). The data were first divided into N = 15 clusters located along the 
nominal trajectory in the state and output space. Based on the data belonging to each cluster 
the estimates of the statistics listed in 3.1 were calculated with eight principal component 
directions in each model. 
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Fig. 4 shows part of the clustered teaching data projected to three input variables. The operating 
point centres are shown as large circles with the same shade as the data belonging to the 
corresponding clusters. Also the state of the system in the operating point centres is drawn. 
Clearly the start of a new DSP is located in the right part of the figure. Following the nominal 
trajectory the state of the biped then changes to a SSP after some four operating points. Finally, 
the step ends just before the beginning of a new DSP in the lower left corner of the figure, where 
the state of the system is almost identical with the original state, only the legs are transposed. 

4.2 Biped behavior 

To test the attained model, a Simulink block was developed to realize the estimation of the 
control related to the measured current state of the system. The block could now be used to 
control the biped instead of the PD controller module applied in the data collection. It appeared 
that the taught clustered regression model was able to keep the biped walking and produced a 
very similar gait as the original PD controller. Fig. 5 shows two steps of the both gaits. 
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Fig. 5. The learned gait was qualitatively quite similar to the original one 
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Fig. 6. The clustered regression controlled system (CRC) is functioning a bit slower than the 
PD controlled one. 
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The biggest difference between the two gaits is that the clustered regression controlled one 
is proceeding a little slower. When comparing the coordinate value changes, the lag of the 
clustered regression controlled system is clearly seen (Fig. 6). The variations in the PD 
controlled signals are due to the added noise in the teaching data generation. However, the 
clustered regression controller can reproduce the original behaviour very accurately. 

5. Optimization 

5.1 Dynamic programming 

It was shown that the clustered regression controller is able to repeat the unoptimized 
behaviour used in the teaching. It would be very beneficial, however, to reach a more 
optimal control scheme so that the biped could, e.g., walk faster with smaller energy 
consumption; this would also be a strong evidence of the validity of the selected model 
structure when explaining biological systems. It would be interesting to see if the biped 
would learn new modes of moving: For example, if the speed were emphasized, would it 
finally learn to run? 

The optimization procedure of the clustered regression structure could be compared with 
dynamic programming (Bellman, 1957) that is based on the principle of optimality. In this case 
this idea can be formulated as follows: 

An optimal control sequence has the property that whatever the initial state and initial 
control are, the remaining controls must constitute an optimal control sequence with 
regard to the state resulting from the first control. 

In general, a dynamic optimization problem can be solved calculating the control from the 
end to the beginning. Starting from the final state, the optimal control leading to that state is 
determined. Then the same procedure is performed in the previous state and so on. Finally, 
when the initial state is reached, the whole optimal control sequence is attained. 
This means that one can form the global optimized behaviour using local optimization. When 
compared with the clustered regression structure, the local optimization should now be 
done inside every cluster or operating point. When the system enters the operating point 
region, the controller is assumedly able to guide it optimally to the next region. 
Another fact from the theory of optimal control states that for a quadratic (infinite-time) 
optimization problem for a linear (affine) system, the optimal control law is also a linear 
(affine) function of the state. This all means that, assuming that the model is locally 
linearizable, a globally (sub)optimal control can be implemented in the proposed clustered 
regression framework. 

5.2 Optimization principle 

As the walking motion is cyclic, one cannot choose the "first" or "last" operating point. 
Instead, the optimization can be carried out gradually in each cluster. A trial-and-error 
based optimization scheme which was used successfully in the previous work (Hyotyniemi, 
2002) is presented in the following. 

As a starting point for the optimization an unoptimized but easily attained behaviour is 
taught to the clustered regression controller (the original PD control). Then the current cost 
of the control for one cycle is calculated. The cost criterion can be chosen rather freely so that 
in the minimum of the criterion the behaviour reaches the desired optimum - for example, 
low energy consumption and high walking speed. 
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When a stable and cyclic motion is obtained the control is slightly varied at one time instant, 
and the new data point is adapted to the clustered regression model. If the new model gives 
a better control, that is, the cost is now smaller, it is accepted. On the other hand, if the cost 
is higher, the new model is rejected. After that, a new change can be made and similarly 
evaluated. Repeating these steps the controller can hopefully learn a more optimal 
behaviour, although the process may be quite slow and nothing prevents it ending to a local 
minimum of the cost function. This random search is an example of reinforcement learning; 
now, because of the mathematically simple underlying model structure, the adaptation of 
the model can still be relatively efficient. 

5.3 Adaptive learning 

In order to optimize the control, it should be possible to update the clustered regression 
structure using new data. This adaptation, however, turned out to be quite hard to realize 
because of the properties of the data clusters. 

It was detected during the PD controlled gait reproduction that at least eight principal 
component directions need to be considered in each local PCR model to reach an accurate 
enough control estimate. The relative importance of each principal component can be 
described by the variance of the corresponding latent variable component. Fig. 7 shows the 
averages of the variances of the local models. 




Fig. 7. Averages of the principal component variances in the local models 

Clearly, the ratio of the first and last principal component variances is huge, which means 
that it is very hard to iteratively determine the last principal component directions. In 
practice, this means that when the training is repeated, always using the earlier model for 
construction of data for the next model, the model sooner or later degenerates. The tiny 
traces of relevant control information are outweighted by measurement noise. 



6. Discussion 

The presented scheme was just a mathematical model applying engineering intuition for 
reaching good control, and the main goal when constructing the model was simplicity. 
However, it seems that there is a connection to real neuronal systems. 

It has been observed (Hay kin, 1999; Hyotyniemi, 2004) that simple Hebbian/anti-Hebbian 
learning in neuronal grids results in a principal subspace model of the input data. This 
Hebbian learning is the principle that is implemented by the simple neurons (Hebb, 1949). 
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The principal subspace captures the principal components, and it is a mathematically valid 
basis for implementations of principal component regressions. 

In (Hyotyniemi, 2006) the Hebbian learning principle is extended by applying feedback 
through environment. It turns out that when the nonideality is taken into account - exploiting 
signals also exhausts them - there is a negative feedback in the system; one can reach 
stabilization of the Hebbian adaptation without additional nonlinearities, and emergence of 
the principal subspace without complicated hierarchies among neurons. There can exist self- 
regulation and self-organization in the neuronal system, meaning that the adaptation of the 
global model can be based solely on the local interactions between individual neurons. 
But as the biped structure is highly nonlinear, one needs to extend the linear model; here 
this was accomplished by introducing the clustered structure with submodels. How could 
such extension be motivated in a neuronal system, as it now seems that some kind of central 
coordination is necessary to select among the submodels and to master their adaptation? 
Again, as studied in (Hyotyniemi, 2006), there emerges sparse coding in a system with the 
Hebbian feedback learning. It can be claimed that in sparse coding the basis vectors are 
rotated to better match the physically relevant features in data - such behaviour has been 
detected, for example, in visual cortex (Foldiak, 2002). In Hebbian feedback learning the 
correlating structures become separated, and they compete for activation; without any 
centralized control structures, the signals become distributed among the best matching 
substructures. As seen from outside, the net effect is to have "virtual" clusters, with smooth 
interpolation between them. 

7. Conclusions 

In this work, a clustered regression structure was used to model and control a walking 
biped robot model. It was shown that the purely data-based model is accurate enough to 
control the biped. The control structures can also be motivated from the physiological point 
of view. 

The main problem is that to successfully reproduce the walking gait, the clustered 
regression controller should learn to keep the system well near the nominal trajectory. If the 
state of the system drifts too far from the learned behaviour, the validity of the local models 
strongly weakens and the system collapses. As a result, the robustness of the controller is 
dependent on the amount of source data variation obtained by additional noise. However, 
the clustered regression structure was unable to control the biped with the required noise 
level present in the PD controlled simulations. This complicates the iterative optimization 
process. 

It was also shown that the management of no less than eight principal components is 
necessary; the "visibility ratio" between these principal components, or the ratio between 
variances, is over three decades. This also dictates the necessary signal-to-noise ratio. It 
seems that such accuracy cannot be achieved in biological systems; to construct a 
biologically plausible control scheme, the model structure has to be modified. 
There are various directions to go. Principal components are always oriented towards the 
input data only, neglecting the output, or the outcome of the actions. This problem should 
somehow be attacked. One possibility would be to explicitly control the adaptation based on 
the observed outputs, so that rather than doing standard principal component analysis, 
some kind of partial least squares (PLS) (Hyotyniemi, 2001) approach would be implemented. 
Also, the data could be rescaled to emphasize the less visible PC directions. 
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It is also important to point out that the physical model of the biped robot is only a model, 
and some phenomena present in the data may be caused by the simulator. Especially the leg 
contacts with the ground may introduce some oscillations which probably would not 
appear in data collected from a real robot. This means that some of the less important 
principal component directions may as well describe these irrelevant effects thus making 
the modelling problem harder. In the future work, it would be interesting to analyze the 
principal component directions in each local model one by one and try to find out which of 
them are really connected to the actual walking motion. 
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1. Introduction 



Sticky Hands is a unique physically-cooperative exercise that was implemented with a full- 
size humanoid robot. This involved the development of a novel biologically-inspired 
learning algorithm capable of generalizing observed motions and responding intuitively. 
Since Sticky Hands has thus far been performed as an exercise for the mutual development 
of graceful relaxed motion and comfortable physical intimacy between two humans, 
performing the exercise with a humanoid robot represents a conceptual advance in the role 
of humanoid robots to that of partners for human self-development. 

Engendering a sense of comfortable physical intimacy between human and robot is a 
valuable achievement: humans must be able to interact naturally with humanoid robots, 
and appreciate their physical capabilities and requirements in given situations in order to 
cooperate physically. Humans are attuned to performing such assessments with regard to 
other humans based on numerous physical and social cues that it will be essential to 
comprehend and replicate in order to facilitate intuitive interaction. 

The chapter expands these issues in more detail, drawing attention to the relevance of 
established research in motion control for computer graphics and robotics, human motion 
performance, and perceptual psychology. In particular, attention is paid to the relevance of 
theories of human motion production, which inspire biomimetic motion performance 
algorithms, and experimental results shedding light on the acuity of human perception of 
motion and motion features. We address the following questions: How do we interact 
naturally, instinctively, intuitively, and effectively with the most adaptable of multipurpose 
machines, i.e., humanoid robots? How do we make interactions natural, graceful, and 
aesthetically pleasing? How do we encourage human attribution in the perception of 
humanoid robots? How can we expand the concept of humanoid robots through novel 
applications? How can we draw on knowledge already existing in other fields to inspire 
developments in humanoid robotics? 

It is with this perspective, and by way of reply to these inquiries, that we begin this chapter 
by describing the implementation of the Sticky Hands system, its intuitive learning and 
generalization system, and hardware. We illustrate how a study of biological processes 
inspired various aspects of the system, such as the plastic memory of its learning system. 
We present experimental results and evaluations of the Sticky Hands system. 
We conclude the chapter by returning to the main philosophical themes presented in the 
paper, and describe a fruitful future direction for humanoid robotics research: we conclude 
that there is a synergistic relationship between neuroscience and humanoid robotics with 



266 Humanoid Robots, New Developments 

the following four benefits: (i) neuroscience inspires engineering solutions e.g., as we have 
illustrated, through human motor production and motion perception research (although the 
breadth of relevant knowledge is much wider than the scope of these two examples); (ii) 
engineering implementations yield empirical evaluations of neuroscientific theories of 
human capabilities; (iii) engineering implementations inspire neuroscientific hypotheses; 
(iv) engineering solutions facilitate new neuroscientific experimental paradigms, e.g., 
through the recording, analysis and replication of motion for psychovisual experiments. 
We believe that exploiting this synergy will yield rapid advances in both fields, and 
therefore advocate parallel neuroscientific experimentation and engineering development in 
humanoid robotics. Regarding research building on the Sticky Hands system, experiments 
involving physical interaction will thus further the development of humanoids capable of 
interacting with humans, and indeed humans capable of interacting with humanoids. 

2. The Sticky Hands game 

'Sticky Hands' was drawn from Tai Chi practice, which often includes physical contact 
exercises with a partner. The game involves forming contact with a partner gently and 
maintaining it while moving in a previously undetermined pattern. The goal is to develop 
an ability to perform relaxed and graceful motion, through learning to be sensitive to the 
forces transmitted via contact and intuitively predict one's partner's movements. When one 
partner yields the other must push, and vice versa. Prolonged practice reveals the 
development of intuition and conditioned responses so that the contact may be preserved 
with a very slight force throughout complex spontaneous sequences of motion. Aspects of 
the game include becoming comfortable with physical contact, a mutual goal of personal 
development, and a fulfilling and calming influence. Some people therefore regard it as a 
form of spiritual development. It is possible for an expert to educate a beginner by 
encouraging them to perform graceful and rewarding movements, and breaking down the 
tension in the student's motion. 

Our goal was to have a humanoid robot take the role of one partner and play Sticky Hands 
with a human. In order to rationalise the interaction, we defined a specific variant of the 
game involving contact using one hand. Partners stand and face each other, raise one hand 
to meet their partner's and begin making slow circling motions. The path of the contact 
point may then diverge into a spontaneously developing trajectory as the partners explore 
their range of physical expression. The robot DB may be seen playing the game in Fig. 1. 




Fig. 1. Playing Sticky Hands with humanoid robot DB. 
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Sticky Hands, being a novel interaction, presents several interesting challenges for the 
design of an intelligent system, including complex motion planning that mimics the 
ineffable quality of human intuition. As a basis the robot must be capable of moving while 
remaining compliant to contact forces from a human, and refining this behaviour, be able to 
learn from and mimic the motion patterns of humans playing the game. By imitating the 
types of pattern produced by humans, the robot may reflect creativity and encourage people 
to explore their own range of motion. To accomplish this while simultaneously experiencing 
new motion patterns that may develop unpredictably it is necessary to provide a learning 
algorithm capable of generalising motion it has seen to new conditions, maintaining a 
suitably evolving internal state. 

This work was motivated by a desire to explore physical interaction between humans and 
robots. This desire grew from the observation that the arising problem for the design of 
appropriate intelligent systems is how to communicate and cooperate with people (Takeda 
et al.1997). Using the principle that physical interaction is a familiar and reliable method for 
most humans, Sticky Hands broaches the area of physical cooperation and the subtle but 
significant issue of communication (Coppin et al. 2000) through physical movement (Adams 
et al. 2000; Hikiji 2000). Our work therefore considers human imitation (Scassellati 2000), 
which we consider to be of particular relevance for the future -as computer science and 
robotics develop it becomes clear that humans and robots will cooperate with a wide range 
of tasks. Moreover, we explored the use of a humanoid robot as a playmate facilitating a 
human's self -development. In this context the robot assumes a new social role involving a 
physically intimate and cooperative interaction. We can also hope that through the 
interaction, people will be encouraged to attribute the robot an anthropomorphic identity 
rather than considering it as a mechanical entity. Such a shift of perspective heralds ways of 
making human and robot interactions more natural. 

Humanoid robotics embodies a certain fascination with creating a mechanical entity 
analogous to our human selves. There are of course many other valid motivations for creating 
humanoids (Bergener et al. 1997), not least among which is the nature of our environment - 
being highly adapted to human sensory and motor capabilities it begs for artificial agents with 
analogous capabilities that can usefully coexist in our own living and working spaces. Having 
an anthropomorphic shape and organic motion also makes working with such robots 
aesthetically and emotionally more pleasing. The production of human-like, and emotionally 
expressive styles of movement are of particular relevance. The Sticky Hands interaction 
embodies human-like motion and autonomy. The target of maintaining minimal contact force 
is tangible. The creative, anticipatory aspect however, is enhanced by initiative. The challenges 
posed by these problems motivated the development of a highly generalized learning 
algorithm, and a theoretical investigation of expressive motion styles. 

We continue this section with a system overview describing the relationship between robot 
control and learning in the Sticky Hands system. We then outline the robot control issues, 
describing additional sensing technology and explaining how we achieved hand placement 
compliant to external forces. We then discuss the learning algorithm which observes 
trajectories of the hand throughout interaction with a human and predicts the development 
of a current trajectory. 

2.1. System overview 

The control system for robotic Sticky Hands was treated as three components which are 
shown in Fig. 2. The robot motor controller is responsible for positioning the hand, and obeys 
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a planned trajectory supplied by the learning algorithm. It also estimates the contact force 
between the human's and robot's hands, adjusting the trajectory plan to compensate for 
contact force discrepancies. The actual hand position data was smoothed to remove noise, 
and sent back to the learning algorithm. 
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Fig. 2. System breakdown. 

The path learning and prediction component is a learning algorithm that outputs a predicted 
hand trajectory and processes the actual observed trajectories supplied by the motor 
controller. The learning algorithm observes the evolution of the hand trajectory 
continuously. It learns motion patterns, and generalises them to predict future 
developments in the hand trajectory. The input and output are both sequences of position 
vectors. The robot controller makes use of the posture controller. The posture controller uses a 
straightforward inverse kinematics routine to generate joint configurations satisfying 
Cartesian hand placement targets. 



2.2 Robot control 

The Sticky Hands exercise was performed by a 30 DOF SARCOS anthropomorphic robot 
(Atkeson et al. 2000) that may be seen in Fig. 3. Each joint is powered hydraulically, and has 
angle and load sensors. Joints are servoed independently by the low-level controller using 
torques proportional to the angular offset between the measured and target angles, and 
negatively proportional to the angular velocity at each joint. This yields proportional 
gains/ spring-damper control at each joint, where the torque at a given joint is calculated as: 



T = k s {9,-9) + k d {9,-9) 



(1) 



6f and &t are the target angle and angular velocity (usually Of — ), and 9 are the 
current angle and angular velocity. k s and kj are the spring stiffness and damping 
parameters respectively. 

Oscillations caused by this proportional gains controller were avoided by employing an 
inverse dynamics algorithm to estimate the torques necessary to hold a position. Since the 
robot is anchored off the ground by its pelvis, standing and balancing did not constitute 
problems. The Sticky Hands exercise involves only one hand and the chain of joints from the 
anchor point to the robot's hand encompasses 10 DOFs. The chain is kinematically 
redundant, so an iterative inverse kinematics algorithm was used (Tevatia & Schaal 2000). 
The 20 unused DOFs were resolved according to a default posture. 
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Fig. 3. DB kinematics. 



The robot was required to balance a force applied by the human player against its hand. 
Trajectories were performed while incorporating any adjustments needed to balance the 
contact force, thus making the hand actively compliant to changes of the contact force. The 
simplest method of determining the contact force is direct measurement, i.e., by means of a 
force transducer between the robot's and human's hands. Fig. 4 shows the attachment of a 
force transducer between the robot's hand and a polystyrene hemisphere intended to 
facilitate an ergonomic surface for the human to contact. The transducer measured forces in 
the X, Y and Z directions. The target position of the hand was translated to compensate if 
the contact force exceeded a 5N threshold. In order to establish a balance point against the 
force applied by the human, we subtracted a small quantity in the Z (forward-backward) 
direction from the measured force prior to thresholding. We also implemented a method of 
responding to the contact force using only the sensors internal to the SARCOS robot, i.e., 
joint load and angle. In both cases we assumed that the interactions were not so forceful as 
to necessitate more than a constant adjustment in hand position, i.e., continuous pressure 
would result in a yielding movement with a constant velocity. 



Polystyrene Hemisphere 



Robot's Palm 



Velcro Attachment 
Fig. 4. Force transducer attachment. 




Force Transducer 



It is possible to calculate the externally applied force using the internal sensors of DB. This 
involves measuring the joint angles, estimating the torques necessary to hold the position 
using inverse dynamics, and subtracting the measured loads. Any discrepancy should be 
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torques due to external loads other than gravity, because gravity is the only external force 
accounted for in the inverse dynamics calculation. The method relies on having a very 
accurate inverse dynamics model, and unfortunately inaccuracies in the dynamic model and 
load sensors necessitated a large contact force to facilitate suitable thresholding of the 
results. 

We therefore measured the positional offset between target and actual hand positions 
instead, assuming any large positional discrepancy was caused by external non-gravity 
forces. We were able to threshold by 2cm (yielding an effective force threshold of about 
12N). When the low-level controller was required to follow a suggested trajectory, the 
hand was directed to 5cm further forward from the suggested position. The human was 
assumed to supply a force sufficient to maintain the hand in the actual target position 
specified by the suggested trajectory. This method did not require readings from the joint 
load sensors to respond to external forces because physical interactions were registered 
purely through the position of the robot's hand -which requires only joint angle readings 
to compute. If the human reduced or increased the contact force the robot's hand would 
move beyond the 2cm threshold and the suggested trajectory consequently adjusted to 
compensate. While the contact force was entirely in the Z direction, perturbations in the X 
and Y as well as Z directions were monitored, and accommodated by translating the 
target position when the 2cm threshold was reached. This indirect kinematic method 
facilitated the use of a significantly lighter contact force than with inverse dynamics based 
force estimation. 

Any reasonable positioning and compliance strategy is in fact compatible with the high- 
level Sticky Hands control system. Ideally, the force threshold is minimal, since the 
threshold determines the minimum contact force. The larger the contact force the less 
relaxed the human's motion may be. The way the redundancy in robot hand placement is 
resolved does not significantly affect the contact force but on the other hand may have an 
effect on the appearance of the robot's motion. We discuss this point later. We also compare 
this kinematic hand positioning technique with the force transducer method and present 
traces of the forces measured during interaction with a human. 

The trajectories supplied by the learning algorithm were described using piecewise linear 
splines. The robot controller ran at 500Hz and the learning algorithm ran at 10Hz. The 
sequence of predictions output by the learning algorithm was interpreted as the advancing 
end-point of a spline. The knots of this spline were translated to compensate for any 
discrepancy in the contact force detected by the motor controller. This translation was 
accomplished smoothly, in order to prevent the hand from jerking in response to contact 
forces: after translating the knots, a negative translation vector was initialized to bring the 
knots back to their original position. This negative translation was gradually decayed to 
zero during each cycle of the motor controller. The sum of the translated knots and the 
negative translation vector thus interpolated from the original knot positions and the 
translated knot positions. 

2.3 'Prototype set' learning algorithm 

3D point samples describing the robot's hand trajectory were fed as input into the learning 
algorithm. A vector predicting the progression of the trajectory was output in return for 
each sample supplied. The learning algorithm fulfilled the following properties, which are 
required by the nature of the Sticky Hands exercise. 
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• Generalise observed trajectories for prediction of similar new trajectories with 
different orientation, scale, curvature, position or velocity 

• Extrapolate properties of a new trajectory for prediction in the absence of similar 
observed trajectories 

• Maintain a fluid internal state in order to cope with the evolving nature of trajectories 
through continuous update, replacement and 'forgetting' of recorded information 

• Handle branch points where similar observed trajectories diverge 

• Tolerate noise causing inaccuracy in position samples 

• Facilitate a parameterisable time bound, ensuring real time operation 

• Facilitate a parameterisable memory bound, in order to fully exploit the host 
architecture 

We refer to the structure used to record the instantaneous properties of the input trajectory 
as a 'prototype'. This paradigm may be compared to the work of Stokes et al. (1999) who 
presented a method for identifying cyclic patterns and their significance in space-line 
samples. Our process focuses rather on the immediate instant of a trajectory. Salient features 
are recorded for efficient retrieval but no internal classifications of higher level structures 
such as cycles are made. This is the essence of the generalisation and branch point handling 
properties of our algorithm since the recorded properties of any instant of an observed 
trajectory may be used to predict the development of any new trajectory. Moreover, no 
information about correlations between trajectories is maintained in an explicit form by the 
prediction process. 

The 'prototype' is defined mathematically below and the creation of prototypes from raw 
geometrical information is presented. The utilisation of prototypes for prediction and 
extrapolation is also demonstrated. Then the issue of how to select the most appropriate 
prototype for predicting a given trajectory from a memory bank of prototypes is 
addressed. The memory bank is maintained according to a reinforcement principle 
designed to ensure an efficient use of memory by 'forgetting' prototypes that are not 
necessary. This search procedure requires a distance metric between prototypes and has 
been optimised. The reader may find it useful to refer to Fig. 5 throughout this prototype 
learning section. 

2.3.1 Prediction using prototypes 

Given a sequence of input position samples, {pt.:£eN} . The prototype P, corresponding to 

Pj is defined using p._j ,p i and p i+l : 

P i =(p„v i ,a„T,) (2) 

v , =ft+i-ft eR3 ( 3 ) 

(4) 
a . JPM-Pi\ eR 
Pi ~ Pi-i 

( 5 ) 
T t = cosl y J, sinl -J- \p M - Pi ) x ( Pi - Pi _ x ) 

C0S -if(P,-P,-i).(p, +1 -ft)| eR (6) 

Pi-Pi-\\\P,+\-Pi\ 
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Vj is the velocity out of Pj (scaled by an arbitrary factor) and #; is a scalar indicating the 

magnitude of the acceleration. The direction of the acceleration is deducible from 7/ , which 

is a quaternion describing the change in direction between Vj and v i-\ as a rotation 
through their mutually orthogonal axis. 
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Fig. 6. Trajectory prediction using a prototype. 
The progression of a trajectory {p' k '■ k <e N} at a given instant may be predicted using a 

prototype. Suppose that for a particular trajectory sample p' j , it is known that Pj 
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corresponds best to Pj , then P' j +a i^i(p' j~P' j-\) is an estimate for P j+\ ■ Pre- 
multiplication of a 3-vector by T\ denotes quaternion rotation in the usual way. This 
formula applies the bend and acceleration occurring at P\ to predict the position of P j . 
We also linearly blend the position of Pi into the prediction, and the magnitude of the 
velocity so that P j combines the actual position and velocity of Pi with the prediction 
duplicating the bending and accelerating characteristics of Pi (see Fig. 6): 

P'j-P'j-\ (> 

P'j+l = P'j +s j T r—, ; 1 + Sp (Pi - P'j ) 

\P ,—P ;-ll 



Sj=(i-g v )a i \p'j-p' j _ l \ 



(8) 



S p and g v are blending ratios used to manage the extent to which predictions are entirely 
general, or repeat previously observed trajectories, i.e., how much the robot wants to repeat 
what it has observed. We chose values of S n and g v in the range [0.1, 0.001] through 

empirical estimation, g p describes the tendency of predictions to gravitate spatially 

towards recorded motions, and g v has the corresponding effect on velocity. 
In the absence of a corresponding prototype we can calculate P j-\ , and use it to estimate 
P' j+1 , thus extrapolating the current characteristics of the trajectory. Repeated 
extrapolations lie in a single plane determined by Pj_2,Pj-i and P t , and maintain the 
trajectory curvature (rotation in the plane) measured at p'j . We must set g p - since 
positional blending makes no sense when extrapolating, and would cause the trajectory to 
slow to a halt, i.e., the prediction should be based on an extrapolation of the immediate 
velocity and turning of the trajectory and not averaged with its current position since there 
is no established trajectory to gravitate towards. 

2.3.2 Storage and retrieval 

Ideally, when predicting P j+\ , an observed trajectory with similar characteristics to those 
at p'j is available. Typically a large set of recorded prototypes is available, and it is 
necessary to find the closest matching prototype P; or confirm that no suitably similar 
prototype exists. The prototype P' j—\ which is generated from the current trajectory can be 
used as a basis for identifying similar prototypes corresponding to similar, previously 
observed trajectories. We define a distance metric relating prototypes in order to 
characterise the closest match. 

I I (9) 

rf(f„P,) = l-cos(g')+ ' y 

M p 

Where, 
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0e[-Af a ,Af fl ]=>0'=0 ;T 



1M « (10) 

0e[-M a> M o ] =>*•=» 

^COS'' VrVj (11) 

I v, || v, I 
M fl and M„ define the maximum angular and positional differences such that d(Pj,P.) may 

be one or less. Prototypes within this bound are considered similar enough to form a basis 
for a prediction, i.e., if d(P i , P: ) is greater than 1 for all i then no suitably similar prototype 

exists. The metric compares the position of two prototypes, and the direction of their 
velocities. Two prototypes are closest if they describe a trajectory traveling in the same 
direction, in the same place. In practice, the values of 15cm and nl 4 radians for M „ and M a 

respectively were found to be appropriate. -A trajectory with exactly the same direction as 
the developing trajectory constitutes a match up to a displacement of 15cm, a trajectory with 
no displacement constitutes a match up to an angular discrepancy of nl 4 radians, and 
within those thresholds there is some leeway between the two characteristics. The threshold 
values must be large enough to permit some generalisation of observed trajectories, but not 
so large that totally unrelated motions are considered suitable for prediction when 
extrapolation would be more appropriate. 

The absolute velocity, and bending characteristics are not compared in the metric. 
Predictions are therefore general with respect to the path leading a trajectory to a certain 
position with a certain direction and velocity, so branching points are not problematic. Also 
the speed at which an observed trajectory was performed does not affect the way it can be 
generalised to new trajectories. This applies equally to the current trajectory and previously 
observed trajectories. 
When seeking a prototype we might naively compare all recorded prototypes with i".-_i 

to find the closest. If none exist within a distance of 1 we use P' .-_j itself to extrapolate as 

above. Needless to say however, it would be computationally over-burdensome to 
compare P" i-\ with all the recorded prototypes. To optimise this search procedure we 

defined a voxel array to store the prototypes. The array encompassed a cuboid enclosing 
the reachable space of the robot, partitioning it into a 50x50x5() array of cuboid voxels 
indexed by three integer coordinates. The storage requirement of the empty array was 
0.5MB. New prototypes were placed in a list attached to the voxel containing their 
positional component p- t . Given P._j we only needed to consider prototypes stored in 

voxels within a distance of M from p -_j since prototypes in any other voxels would 

definitely exceed the maximum distance according to the metric. Besides limiting the total 
number of candidate prototypes, the voxel array also facilitated an optimal ordering for 
considering sets of prototypes. The voxels were considered in an expanding sphere about 
p.- . A list of integer- triple voxel index offsets was presorted and used to quickly identify 

voxels close to a given centre voxel ordered by minimum distance to the centre voxel. The 
list contained voxels up to a minimum distance of M „ . This ensures an optimal search of 

the voxel array since the search may terminate as soon as we encounter a voxel that is too 
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far away to contain a prototype with a closer minimum distance than any already found. 
It also permits the search to be cut short if time is unavailable. In this case the search 
terminates optimally since the voxels most likely to contain a match are considered first. 
This facilitates the parameterisable time bound since the prototype search is by far the 
dominant time expense of the learning algorithm. 

2.3.3 Creation and maintenance 

Prototypes were continually created based on the stream of input position samples 
describing the observed trajectory. It was possible to create a new prototype for each new 
sample, which we placed in a cyclic buffer. For each new sample we extracted the average 
prototype of the buffer to reduce sampling noise. A buffer of 5 elements was sufficient. The 
averaged prototypes were shunted through a delay buffer, before being added to the voxel 
array. This prevented prototypes describing a current trajectory from being selected to 
predict its development (extrapolation) when other prototypes were available. The delay 
buffer contained 50 elements, and the learning algorithm was iterated at 10Hz so that new 
prototypes were delayed by 5 seconds. 

Rather than recording every prototype we limited the total number stored by averaging 
certain prototypes. This ensures the voxel array does not become clogged up and slow, and 
reduces the memory requirement. Therefore before inserting a new prototype into the voxel 
array we first searched the array for a similar prototype. If none was found we added the 
new prototype, otherwise we blended it with the existing one. We therefore associated a 
count of the number of blends applied to each prototype to facilitate correct averaging with 
new prototypes. In fact we performed a non-linear averaging that capped the weight of the 
existing values, allowing the prototypes to tend towards newly evolved motion patterns 
within a limited number of demonstrations. Suppose P a incorporates « blended prototypes, 
then a subsequent blending with P b will yield: 



" D(n) b D(n) 



p ;= p a ^rr + p 4 — (12) 



D(n) = \ + A M --^- (13) 

1 + nA G v ; 

A M /(l + nA M ) defines the maximum weight for the old values, and a q determines how 

quickly it is reached. Values of 10 and 0.1 for A u and a g respectively were found to be 
suitable. This makes the averaging process linear as usual for small values but ensures the 
contribution of the new prototype is worth at least 1/11*. 

We facilitated an upper bound on the storage requirements using a deletion indexing 
strategy for removing certain prototypes. An integer clock was maintained, and 
incremented every time a sample was processed. New prototypes were stamped with a 
deletion index set in the future. A list of the currently stored prototypes sorted by deletion 
index was maintained, and if the storage bounds were reached the first element of the list 
was removed and the corresponding prototype deleted. The list was stored as a heap 
(Cormen et al.) since this data structure permits fast 0(log(num elements)) insertion, 
deletion and repositioning. We manipulated the deletion indices to mirror the 
reinforcement aspect of human memory. A function R(n) defined the period for which a 
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prototype reinforced n times should be retained (n is equivalent to the blending count). 
Each time a prototype was blended with a new one we calculated the retention period, 
added the current clock and re-sorted the prototype index. ^(") increases exponentially 
up to a maximum asymptote. 

D M gives the maximum asymptote. d g and D p determine the rate of increase. Values of 
20000, 0.05 and 2 were suitable for D M' D G ar >d D p respectively. The initial reinforcement 
thus extended a prototype's retention by 2 minutes, and subsequent reinforcements roughly 
doubled this period up to a maximum of about half an hour (the algorithm was iterated at 
10Hz). 

3. Results 

The initial state and state after playing Sticky Hands with a human partner are shown in Fig. 
7. Each prototype is plotted according to its position data. The two data sets are each viewed 
from two directions and the units (in this and subsequent figures) are millimeters. The X, Y 
& Z axes are positive in the robot's left, up and forward directions respectively. The point 
(0,0,0) corresponds to the robot's sacrum. The robot icons are intended to illustrate 
orientation only, and not scale. Each point represents a unique prototype stored in the 
motion predictor's memory, although as discussed each prototype may represent an 
amalgamation of several trajectory samples. The trajectory of the hand loosely corresponds 
to the spacing of prototypes but not exactly because sometimes new prototypes are blended 
with old prototypes according to the similarities between each's position and velocity 
vectors. 

The initial state was loaded as a default. It was originally built by teaching the robot to 
perform an approximate circle 10cm in radius and centred in front of the left elbow joint 
(when the arm is relaxed) in the frontal plane about 30cm in front of the robot. The 
prototype positions were measured at the robot's left hand, which was used to play the 
game and was in contact with the human's right hand throughout the interaction. The 
changes in the trajectory mostly occur gradually as human and robot slowly and 
cooperatively develop cycling motions. Once learned, the robot can switch between any 
of its previously performed trajectories, and generalise them to interpret new 
trajectories. 

The compliant positioning system, and its compatibility with motions planned by the 
prediction algorithm was assessed by comparing the Sticky Hands controller with a 
'positionable hand' controller that simply maintains a fixed target for the hand in a 
compliant manner so that a person may reposition the hand. 

Fig. 8 shows a force/ position trace where the width of the line is linearly proportional to 
the magnitude of the force vector (measured in all 3 dimensions), and Table 1 shows 
corresponding statistics. Force measurements were averaged over a one minute period of 
interaction, but also presented are 'complied forces', averaging the force measurements 
over only the periods when the measured forces exceeded the compliance threshold. 
From these results it is clear that using the force transducer yielded significantly softer 
compliance in all cases. Likewise the 'positionable hand' task yielded slightly softer 
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compliance because the robot did not attempt to blend its own trajectory goals with those 
imposed by the human. 

Initial State 



z % 





Human Interaction 




*ViN 



Y ' : X 

Fig. 7. Prototype state corresponding to a sample interaction 
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Sticky Hands 






Fig. 8. Force measured during 'positionable hand' and Sticky Hands tasks. 
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Task 


Contact force (N) 


Complied forces (N) 


Mean 


Var. 


Mean 


Var. 


Force Transducer Sticky Hands 


4.50 


4.83 


5.72 


4.49 


Force Transducer 'Positionable Hand' 


1.75 


2.18 


3.23 


2.36 


Kinematically Compliant Sticky Hands 


11.86 


10.73 


13.15 


10.73 


Kinematically Compliant 'Positionable Hand' 


8.90 


10.38 


12.93 


11.40 



Table 1 Forces experienced during 'positionable hand' and Sticky Hands tasks. 

Examining a sequence of interaction between the robot and human reveals many of the learning 
system's properties. An example sequence during which the robot used the kinematic 
compliance technique is shown in Fig. 9. The motion is in a clockwise direction, defined by 
progress along the path in the a-b-c direction, and was the first motion in this elliptical pattern 
observed by the prediction system. The 'Compliant Adjustments' graph shows the path of the 
robot's hand, and is marked with thicker lines at points where the compliance threshold was 
exceeded, i.e., points where the prediction algorithm was mistaken about the motion the human 
would perform. The 'Target Trajectory' graph shows in lighter ink the target sought by the 
robot's hand along with in darker ink the path of the robot's hand. The target is offset in the Z 
(forwards) direction in order to bring about a contact force against the human's hand. At point 
(a) there is a kink in the actual hand trajectory, a cusp in the target trajectory, and the beginning 
of a period during which the robot experiences a significant force from the human. This kink is 
caused by the prediction algorithm's expectation that the trajectory will follow previously 
observed patterns that have curved away in the opposite direction, the compliance mamtaining 
robot controller adjusts the hand position to attempt to balance the contact force until the 
curvature of the developing trajectory is sufficient to extrapolate its shape and the target 
trajectory well estimates the path performed by the human. At point (b) however, the human 
compels the robot to perform an elliptical shape that does not extrapolate the curvature of the 
trajectory thus far. At this point the target trajectory overshoots the actual trajectory due to its 
extrapolation. Once again there is a period of significant force experienced against the robot's 
hand and the trajectory is modified by the compliance routine. At point (c) we observe that, 
based on the prototypes recorded during the previous ellipse, the prediction algorithm correctly 
anticipates a similar elliptical trajectory offset positionally and at a somewhat different angle. 



Compliant Adjustments 



Target Trajectory 

_ (0 





Fig. 9 Example interaction showing target trajectory and compliance activation 



4. Discussion 

We proposed the 'Sticky Hands' game as a novel interaction between human and robot. The 
game was implemented by combining a robot controller process and a learning algorithm with a 
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novel internal representation. The learning algorithm handles branching trajectories implicitly 
without the need for segmentation analysis because the approach is not pattern based. It is 
possible to bound the response time and memory consumption of the learning algorithm 
arbitrarily within the capabilities of the host architecture. This may be achieved trivially by 
restricting the number of prototypes examined or stored. The ethos of our motion system may be 
contrasted with the work of Williamson (1996) who produced motion controllers based on 
positional primitives. A small number of postures were interpolated to produce target joint 
angles and hence joint torques according to proportional gains. Williamson's work advocated the 
concept of "behaviours or skills as coarsely parameterised atoms by which more complex tasks 
can be successfully performed". Corresponding approaches have also been proposed in the 
computer animation literature, such as the motion verbs and adverbs of Rose et al. (1998). 
Williamson's system is elegant, providing a neatly bounded workspace, but unfortunately it was 
not suitable for our needs due to the requirements of a continuous interaction incorporating 
more precise positioning of the robot's hand. 

By implementing Sticky Hands, we were able to facilitate physically intimate interactions 
with the humanoid robot. This enables the robot to assume the role of playmate and 
partner assisting in a human's self-development. Only minimal sensor input was required 
for the low-level motor controller. Only torque and joint position sensors were required, 
and these may be expected as standard on most humanoid robots. With the addition of a 
hand mounted force transducer the force results were also obtained. Our work may be 
viewed as a novel communication mechanism that accords with the idea that an 
autonomous humanoid robot should accept command input and maintain behavioral 
goals at the same level as sensory input (Bergener et al. 1997). Regarding the issue of 
human instruction however, the system demonstrates that the blending of internal goals 
with sensed input can yield complex behaviors that demonstrate a degree of initiative. 
Other contrasting approaches (Scassellati 1999) have achieved robust behaviors that 
emphasize the utility of human instruction in the design of reinforcement functions or 
progress estimators. 

The design ethos of the Sticky Hands system reflects a faith in the synergistic relationship 
between humanoid robotics and neuroscience. The project embodies the benefits of cross- 
fertilized research in several ways. With reference to the introduction, it may be seen that (i) 
neuroscientific and biological processes have informed and inspired the development of the 
system, e.g., through the plastic memory component of the learning algorithm, and the 
control system's "intuitive" behaviour which blends experience with immediate sensory 
information as discussed further below; (ii) by implementing a system that incorporates 
motion based social cues, the relevance of such cues has been revealed in terms of human 
reactions to the robot. Also, by demonstrating that a dispersed representation of motion is 
sufficient to yield motion learning and generalization, the effectiveness of solutions that do 
not attempt to analyze nor segment observed motion has been confirmed; (iii) technology 
developed in order to implement Sticky Hands has revealed processes that could plausibly 
be used by the brain for solving motion tasks, e.g., the effectiveness of the system for 
blending motion targets with external forces to yield a compromise between the motion 
modeled internally and external influences suggests that humans might be capable of 
performing learned motion patterns according to a consistent underlying model subject to 
forceful external influences that might significantly alter the final motion; (iv) the Sticky 
Hands system is in itself a valuable tool for research since it provides an engaging 
cooperative interaction between a human and a humanoid robot. The robot 's behaviour 
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may be modulated in various ways to investigate for example the effect of less compliant 
motion, different physical cues, or path planning according to one of various theories of 
human motion production. 

The relationship between the engineering and computational aspect of Sticky Hands and the 
neuroscientific aspect is thus profound. This discussion is continued in the following 
sections which consider Sticky Hands in the context of relevant neuroscientific fields: 
human motion production, perception, and the attribution of characteristics such as 
naturalness and affect. The discussion is focused on interaction with humans, human 
motion, and lastly style and affect. 

4.1 Interacting with humans 

The Sticky Hand task requires two partners to coordinate their movements. This type of 
coordination is not unlike that required by an individual controlling an action using both 
their arms. However, for such bimanual coordination there are direct links between the two 
sides of the brain controlling each hand. Though surprisingly, even when these links are 
severed in a relatively rare surgical intervention known as callosotomy, well-learned 
bimanual processes appear to be remarkably unaffected (Franz, Waldie & Smith, 2000). This 
is consistent with what we see from experienced practitioners of Tai Chi who perform Sticky 
Hands: that experience with the task and sensory feedback are sufficient to provide graceful 
performance. It is a reasonable speculation that the crucial aspect of experience lays in the 
ability to predict which movements are likely to occur next, and possibly even what sensory 
experience would result from the actions possible from a given position. 
A comparison of this high level description with the implementation that we used in the Sticky 
Hands task is revealing. The robot's experience is limited to the previous interaction between 
human and robot and sensory information is limited to either the kinematics of the arm and 
possibly also force information. Clearly the interaction was smoother when more sensory 
information was available and this is not entirely unexpected. However, the ability of the robot 
to perform the task competently with a very minimum of stored movements is impressive. 
One possibility worth considering is that this success might have been due to a fortunate 
matching between humans' expectations of how the game should start and the ellipse that the 
robot began with. This matching between human expectations and robot capabilities is a 
crucial question that is at the heart of many studies of human-robot interaction. 
There are several levels of possible matching between robot and human in this Sticky Hands 
task. One of these, as just mentioned is that the basic expectations of the range of motion are 
matched. Another might be that the smoothness of the robot motion matches that of the 
human and that any geometric regularities of motion are matched. For instance it is known 
that speed and curvature are inversely proportional for drawing movements (Lacquaniti et 
al. 1983) and thus it might be interesting in further studies to examine the effect of this factor 
in more detail. A final factor in the relationship between human and robot is the possibility 
of social interactions. Our results here are anecdotal, but illustrative of the fact that 
secondary actions will likely be interpreted in a social context if one is available. One early 
test version of the interaction had the robot move its head from looking forward to looking 
towards its hand whenever the next prototype could not be found. From the standpoint of 
informing the current state of the program this was useful. However, there was one 
consequence of this head movement that likely was exacerbated by the fact that it was the 
more mischievous actions of the human partner that would confuse the robot. This lead the 
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robot head motion to fixate visually on its own hand, which by coincidence was where most 
human partners were also looking, leading to a form of mutual gaze between human and 
robot. This gestural interaction yielded variable reports from the human players as either a 
sign of confusion or disapproval by the robot. 

This effect is illustrative of the larger significance of subtle cues embodied by human motion that 
may be replicated by humanoid robots. Such actions or characteristics of motion may have 
important consequences for the interpretation of the movements by humans. The breadth of 
knowledge regarding these factors further underlines their value. There is much research 
describing how humans produce and perceive movements and many techniques for producing 
convincing motion in the literature of computer animation. For example, there is a strong duality 
between dynamics based computer animation and robotics (Yamane & Nakamura 2000). 
Computer animation provides a rich source of techniques for generating (Witkin & Kass 1988; 
Cohen 1992; Ngo & Marks 1993; Li et al. 1994; Rose et al. 1996; Gleicher 1997) and manipulating 
(Hodgins & Pollard 1997) dynamically correct motion, simulating biomechanical properties of 
the human body (Komura & Shinagawa 1997) and adjusting motions to display affect or achieve 
new goals (Bruderlin & Williams 1995; Yamane & Nakamura 2000). 

4.2 Human motion 

Although the technical means for creating movements that appear natural and express affect, 
skill, etc. are fundamental, it is important to consider the production and visual perception of 
human movement. The study of human motor control for instance holds the potential to reveal 
techniques that improve the replication of human-like motion. A key factor is the representation of 
movement. Interactions between humans and humanoids may improve if both have similar 
representations of movement. For example, in the current scenario the goal is for the human and 
robot to achieve a smooth and graceful trajectory. There are various objective ways to express 
smoothness. It can be anticipated that if both the humanoid and human shared the same 
representation of smoothness then the two actors may converge more quickly to a graceful path. 
The visual perception of human movement likewise holds the potential to improve the quality of 
human-robot interactions. The aspects of movement that are crucial for interpreting the motion 
correctly may be isolated according to an analysis of the features of motion to which humans are 
sensitive. For example, movement may be regarded as a complicated spatiotemporal pattern, but 
the recognition of particular styles of movement might rely on a few isolated spatial or temporal 
characteristics of the movement. Knowledge of human motor control and the visual perception 
of human movement could thus beneficially influence the design of humanoid movements. 
Several results from human motor control and motor psychophysics inform our understanding 
of natural human movements. It is generally understood several factors contribute to the 
smoothness of human arm movements. These include the low-pass filter characteristics of the 
musculoskeletal system itself, and the planning of motion according to some criteria reflecting 
smoothness. The motivation for such criteria could include minimizing the wear and tear on the 
musculoskeletal system, minimizing the overall muscular effort, and maximizing the compliance 
of motions. Plausible criteria that have been suggested include the minimization of jerk, i.e., the 
derivative of acceleration (Flash & Hogan 1985), minimizing the torque change (Uno et al. 1989), 
the motor-command change (Kawato 1992), or signal dependent error (Harris & Wolpert 1998). 
There are other consistent properties of human motion besides smoothness that have been 
observed. For example, that the endpoint trajectory of the hand behaves like a concatenation of 
piecewise planar segments (Soechting & Terzuolo 1987a; Soechting & Terzuolo 1987b). Also, the 
movement speed is related to its geometry in terms of curvature and torsion. Specifically, it has 
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been reported that for planar segments velocity is inversely proportional to curvature raised to 
the l/3rd power, and that for non-planar segments the velocity is inversely proportional to the 
l/3rd power of curvature multiplied by l/6th power of torsion (Lacquaniti et al. 1983; Viviani & 
Stucchi 1992; Pollick & Sapiro 1996; Pollick et al. 1997; Handzel & Flash, 1999). Extensive 
psychological experiments of the paths negotiated by human-humanoid dyads could inform 
which principles of human motor control are appropriate for describing human-humanoid 
cooperative behaviours. 

4.3 Style and affect 

Recent results examining the visual recognition of human movement are also of relevance 
with regard to the performance of motion embodying human-like styles. By considering the 
relationship between movement kinematics and style recognition, it has been revealed that 
recognition can be enhanced by exaggerating temporal (Hill & Pollick 2000), spatial (Pollick 
et al. 2001a), and spatiotemporal (Giese & Poggio 2000; Giese & Lappe 2002) characteristics 
of motion. The inference of style from human movement (Pollick et al. 2001b) further 
supports the notion that style may be specified at a kinematic level. The kinematics of 
motion may thus be used to constrain the design of humanoid motion. 

However, the meaningful kinematic characteristics of motion may rely on dynamic properties 
in a way that can be exploited for control purposes. The brief literature review on human 
motor control and visual perception of human movement above provides a starting point for 
the design of interactive behaviours with humanoid robots. The points addressed focus on the 
motion of the robot and may be viewed as dealing with the problem in a bottom up fashion. In 
order to make progress in developing natural and affective motion it is necessary to determine 
whether or not a given motion embodies these characteristics effectively. However, it is 
possible that cognitive factors, such as expectancies and top down influences might dominate 
interactions between humans and humanoids, e.g., the humanoid could produce a natural 
movement with affect but the motion could still be misinterpreted if there is an expectation 
that the robot would not move naturally or display affect. 

5. Conclusion 

Having described the Sticky Hands project: it's origin, hardware and software 
implementation, biological inspiration, empirical evaluation, theoretical considerations and 
implications, and having broadened the later issues with a comprehensive discussion, we 
now return to the enquiries set forth in the introduction. 

The Sticky Hands project itself demonstrates a natural interaction which has been 
accomplished effectively -the fact that the objectives of the interaction are in some aspects 
open-ended creates leeway in the range of acceptable behaviours but also imposes complex 
high-level planning requirements. Again, while these may be regarded as peculiar to the 
Sticky Hands game they also reflect the breadth of problems that must be tackled for 
advanced interactions with humans. The system demonstrates through analysis of human 
motion, and cooperation how motion can be rendered naturally, gracefully and aesthetically. 
These characteristics are both key objectives in Sticky Hands interaction, and as we have 
indicated in the discussion also have broader implications for the interpretation, quality and 
effectiveness of interactions with humans in general for which the attribution of human 
qualities such as emotion engender an expectation of the natural social cues that improve 
the effectiveness of cooperative behaviour through implicit communication. 
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We have drawn considerable knowledge and inspiration from the fields of computer 
graphics, motion perception and human motion performance. The benefit that the latter two 
fields offer for humanoid robotics reveal an aspect a larger relationship between humanoid 
robotics and neuroscience. There is a synergistic relationship between the two fields that 
offers mutual inspiration, experimental validation, and the development of new 
experimental paradigms to both fields. We conclude that exploring the depth of this 
relationship is a fruitful direction for future research in humanoid robotics. 
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1. Introduction 

An obvious problem confronting humanoid robotics is the generation of stable and 
efficient gaits. Whereas wheeled robots normally are statically balanced and remain 
upright regardless of the torques applied to the wheels, a bipedal robot must be actively 
balanced, particularly if it is to execute a human-like, dynamic gait. The success of gait 
generation methods based on classical control theory, such as the zero-moment point 
(ZMP) method (Takanishi et al., 1985), relies on the calculation of reference trajectories for 
the robot to follow. In the ZMP method, control torques are generated in order to keep the 
zero-moment point within the convex hull of the support area defined by the feet. When 
the robot is moving in a well-known environment, the ZMP method certainly works well. 
However, when the robot finds itself in a dynamically changing real-world environment, 
it will encounter unexpected situations that cannot be accounted for in advance. Hence, 
reference trajectories can rarely be specified under such circumstances. In order to address 
this problem, alternative, biologically inspired control methods have been proposed, 
which do not require the specification of reference trajectories. The aim of this chapter is 
to describe one such method, based on central pattern generators (CPGs), for control of 
bipedal robots. 

Clearly, walking is a rhythmic phenomenon, and many biological organisms are indeed 
equipped with CPGs, i.e. neural circuits capable of producing oscillatory output given tonic 
(non-oscillating) activation (Grillner, 1996). There exists biological evidence for the presence 
of central pattern generators in both lower and higher animals. The lamprey, which is one of 
the earliest and simplest vertebrate animals, swims by propagating an undulation along its 
body. The wave-like motion is produced by an alternating activation of motor neurons on 
the left and right sides of the segments along the body. The lamprey has a brain stem and 
spinal cord with all basic vertebrate features, but with orders of magnitude fewer nerve cells 
of each type than higher vertebrates. Therefore, it has served as a prototype organism for the 
detailed analysis of the nervous system, including CPGs, in neurophysiological studies 
(Grillner, 1991; Grillner, 1995). In some early experiments by Brown (Brown, 1911, Brown, 
1912), it was shown that cats with transected spinal cord and with cut dorsal roots still 
showed rhythmic alternating contractions in ankle flexors and extensors. This was the basis 
of the concept of a spinal locomotor center, which Brown termed the half-center model 
(Brown, 1914). Further biological support for the existence of a spinal CPG structure in 
vertebrates is presented in (Duysens & Van de Crommert, 1998). 
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However, there is only evidence by inference of the existence of human CPGs. The strongest 
evidence comes from studies of newborns, in which descending supraspinal control is not 
yet fully developed, see e.g. (Zehr & Duysens, 2004) and references therein. Furthermore, 
advances made in the rehabilitation of patients with spinal cord lesions support the notion 
of human CPGs: Treadmill training is considered by many to rely on the adequate afferent 
activation of CPGs (Duysens & Van de Crommert, 1998). In view of the results of the many 
extensive studies on the subject, it seems likely that primates in general, and humans in 
particular, would have a CPG-like structure. 

In view of their ability to generate rhythmic output patterns, CPGs are well suited as the 
basis for bipedal locomotion. Moreover, CPGs exhibit certain properties of adaptation to the 
environment: Both the nervous system, composed of coupled neural oscillators, and the 
musculo-skeletal system have their own nonlinear oscillatory dynamics, and it has been 
demonstrated that, during locomotion, some recursive dynamics occurs between these two 
systems. This phenomenon, termed mutual entrainment, emerges spontaneously from the 
cooperation among the systems' components in a self-organized way (Taga et al., 1991). 
That is, natural periodic motion, set close to the natural (resonant) frequency of the 
mechanical body, is achieved by the entrainment of the CPGs to a mechanical resonance by 
sensory feedback. The feedback is non-essential for the rhythmic pattern generation itself, 
but rather modifies the oscillations in order to achieve adaptation to environmental changes. 
In the remainder of this chapter, the use of CPGs in connection with bipedal robot control 
will be discussed, with particular emphasis on CPG network optimization aimed at 
achieving the concerted activity needed for bipedal locomotion. However, first, a brief 
introduction to various CPG models will be given. 

2. Biological and analytical models for CPGs 

2.1 Models from biology 

From biological studies, three main types of neural circuits for generating rhythmic motor 
output have been proposed, namely the closed-loop model, the pacemaker model, and the 
half-center model. 

The closed-loop model was originally proposed for the salamander (Kling & Szekely, 1968). 
In some way it resembles the half-center model (see below), but the interneurons are 
organized in a closed loop of inhibitory connections. There are corresponding pools of 
motor neurons activated, or inhibited, in sequence, allowing for a finer differentiation in the 
activation of the flexors and extensors, respectively. 

In the pacemaker model, rhythmic signals result as an intrinsic cell membrane property, 
involving complex interaction of ionic currents, of a group of pacemaker cells. The electrical 
impulses that control heart rate are generated by such cells. The pacemaker cells drive flexor 
motor neurons directly, and bring about concurrent inhibition of extensor motor neurons 
through inhibitory interneurons. These two models are further described in (Shephard, 
1994). 

The half -center model, mentioned above, was suggested by Brown (Brown, 1914) in order to 
account for the alternating activation of flexor and extensor muscles of the limbs of the cat 
during walking. Each pool of motor neurons for flexor or extensor muscles is activated by a 
corresponding half -center of interneurons, i.e. neurons that send signals only to neurons and 
not to other body parts (such as muscles). Another set of neurons provides a steady 
excitatory drive to these interneurons. Furthermore, inhibitory connections between each 
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half-center of interneurons ensure that when one half-center is active, the other is being 
suppressed. It was hypothesized that, as activity in the first half -center progressed, a process 
of fatigue would build up in the inhibitory connections between the two half-centers, 
thereby switching activity from one half-center to the other (Brown, 1914). Since then, 
support for the half -center model has been found in experiments with cats (Duysens & Van 
de Crommert, 1998). 

2.2 Computational CPG Models 

In mathematical terms, CPGs are usually modeled as a network of identical systems of 
differential equations, which are characterized by the presence of attractors, i.e. bounded 
subsets of the phase space to which the dynamics becomes confined after a sufficiently long 
time (Ott, 1993). Usually, a periodic gait of a legged robot is a limit cycle attractor, since the 
robot periodically returns to (almost) the same configuration in phase space. 
Several approaches for computational modeling of the characteristics of CPGs can be found 
in the literature: Drawing upon neurophysiological work on the lamprey spinal cord, 
Ekeberg and co-workers have studied CPG networks based on model neurons ranging from 
biophysically realistic neuronal models, describing the most important membrane currents 
and other mechanisms of importance (Ekeberg et al., 1991), to simple connectionist-type 
non-spiking neurons (Ekeberg, 1993). The use of the biophysical models makes it possible to 
compare the simulation results directly with corresponding experimental data. The 
advantage of using the simpler model, on the other hand, is the weak dependence of certain 
parameters that are hard to measure experimentally. 




fa 
"/ 

Fig. 1. The Matsuoka oscillator unit. The nodes (1) and (2) are referred to as neurons, or cells. 
Excitatory connections are indicated by open circles, and inhibitory connections are 
indicated by filled disks. 

However, in this work the CPG model formulated in mathematical terms by Matsuoka 
(Matsuoka, 1987) has been used for the development of CPG networks for bipedal walking. 
The Matsuoka model is a mathematical description of the half-center model. In its simplest 
form, a Matsuoka CPG (or oscillator unit) consists of two neurons arranged in mutual 
inhibition, as depicted in Fig. 1. The neurons in the half -center model are described by the 
following differential equations (Taga, 1991): 



T„U; 



w=- v ,+y, ( 2 ) 
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where u; is the inner state of neuron i, Vi is an auxiliary variable measuring the degree of self- 
inhibition (modulated by the parameter /?) of neuron i, t u and t v are time constants, u is an 
external tonic (non-oscillating) input, Wij are the weights connecting neuron j to neuron i, 
and, finally, y, is the output of neuron i. Two such neurons arranged in a network of mutual 
inhibition (a half -center model), form an oscillator, in which the amplitude of the oscillation 
is proportional to the tonic input Uo. The frequency of the oscillator can be controlled by 
changing the values of the two time constants t u and r v . If an external oscillatory input is 
applied to the input of a Matsuoka oscillator, the CPG can lock onto its frequency. Then, 
when the external input is removed, the CPG smoothly returns to its original oscillation 
frequency. This property, referred to as entrainment, is highly relevant for the application of 
the Matsuoka oscillator in adaptive locomotion (Taga, 1991). 

3. CPGs in bipedal robot control 

Generating robust gaits for bipedal robots using artificial counterparts to biological CPGs is 
an active field of research. The first results in this field were obtained using simple 2D 
models, and somewhat later, simplified 3D models. The most recent results, however, cover 
the use of realistic 3D simulations often corresponding to real, physical robots (Righetti & 
Ijspeert, 2006). Several results have also been implemented using real robots, involving both 
2D locomotion (Endo et al., 2004; Lewis et al., 2005) and full 3D locomotion (Ogino et al. 
2004). 

3.1 CPG-based control of simulated robots 

In works by Taga and co-workers (Taga et al., 1991; Taga, 2000), a gait controller based on 
the half -center CPG model was investigated for a 2D simulation of a five-link bipedal robot. 
By creating global entrainment between the CPGs, the musculoskeletal system, and the 
environment, robustness against physical perturbations as well as the ability to walk on 
different slopes were achieved (Taga et al., 1991). Moreover, the possibility to regulate the 
step length was realized and demonstrated in an obstacle avoidance task (Taga, 2000). 
Reil and Husbands (Reil & Husbands, 2002) used genetic algorithms (GAs) in order to 
optimize fully connected recurrent neural networks (RNNs), which were used as CPGs to 
generate bipedal walking in 3D simulation. The GA was used for optimizing weights, time 
constants and biases in fixed-architecture RNNs. The bipedal model consisted of a pair of 
articulated legs connected with a link. Each leg had three degrees-of -freedom (DOFs). The 
resulting CPGs were capable of generating bipedal straight-line walking on a planar surface. 
Furthermore, by integrating the gait controller with an auditory input for sound 
localization, directional walking was achieved. 

In a recent work by Righetti and Ijspeert, a system of coupled nonlinear oscillators was used 
as programmable CPGs in a bipedal locomotion task (Righetti & Ijspeert, 2006). The CPG 
parameters, such as intrinsic frequencies, amplitudes, and coupling weights, were adjusted 
to replicate a teaching signal corresponding to pre-existing walking trajectories. Once the 
teaching signal was removed, the trajectories remained embedded as the limit cycle of the 
dynamical system. The system was used to control 10 joints in a 25 DOF simulated HOAP-2 
robot (the remaining joints were locked). It was demonstrated that, by varying the intrinsic 
frequencies and amplitudes of the CPGs, the gait of the robot could be modulated in terms 
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of walking speed and step length. By continuously decreasing the speed, the robot could be 
brought to a halt in a controlled way. It was also possible to generate backwards walking by 
simply inverting the sign of all frequency parameters in the CPG controller. 

3.2 CPG-based control of real, physical robots 

The examples given above involve simulated robots. However, some studies involving 
CPG-based control of real, physical robots have also been made. In work by Endo and co- 
workers (Endo et al., 2004) and Lewis and co-workers (Lewis et al., 2005), controllers based 
on the half-center CPG model were employed. The robots used were, however, somewhat 
simplified, having only 4 DOFs each. Further simplification included a supporting rod that 
was attached to the robots and to the floor, in order to prevent the robots from falling over 
sideways. A more complex robot was used in (Ogino et al., 2004) in a gait generation task. 
Here, reinforcement learning based on visual feedback was used to adjust the parameters of 
a CPG-based controller for a HOAP-1 robot, in order to achieve directional walking. 

3.3 Optimization of CPG-based controllers 

The examples given above confirm that CPGs are indeed suitable for generation of gaits and 
other types of repetitive motions. However, in most studies, the design of the CPG networks 
has commonly been carried out manually, in an intuitive manner, e.g. (Taga, 2000; Ogino et 
al., 2004). This is a time-consuming and difficult process, and it may also lead to suboptimal 
performance. By contrast, GAs are well suited for structural optimization, i.e. modification 
of the CPG network structure during optimization. However, even in cases where GAs have 
been applied, e.g. (Reil & Husbands, 2002), the approach has generally been restricted to 
parametric optimization in a network of fixed architecture. Furthermore, the use of 
predefined trajectories, as in (Righetti & Ijspeert, 2006), limits the approach only to those 
situations for which such trajectories exist. Nevertheless, the results presented in (Righetti & 
Ijspeert, 2006) reveal one of the advantages associated with a CPG-based approach, namely 
the ability of the corresponding controllers to smoothly change the gait pattern online, by 
simple parameter modification. 

The work presented in the remainder of this chapter concerns simultaneous parametric and 
structural optimization of CPG networks in a gait generation task, using GAs as the 
optimization tool. Some early results from this study can also be found in (Wolff et al., 2006). 
When generating bipedal locomotion by artificial evolution from a starting point of 
essentially random CPG networks (or other controllers, for that matter), a great challenge 
concerns the definition of the fitness function. Using, for example, the distance covered in 
the initial forward direction as the fitness measure commonly results in controllers that 
simply throw the body of the robot forward, rather than walking. While actual walking 
would certainly result in higher fitness values, such solutions are very hard to find, given 
the easily accessible local optimum found by throwing the robot's body forward. Adding 
constraints on body posture as part of the fitness measure will in most cases only lead to 
other non-desirable gaits that display very limited similarity with human-like gaits. Thus, in 
the experiments presented below, rather than trying to evolve an upright, human-like 
bipedal gait by an ingenious definition of a fitness function, a supporting structure will be 
used for the purpose of helping the robot to balance as it starts to walk. Some different 
strategies for subsequently removing this support, while maintaining a dynamically stable 
gait, will then be investigated. 
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In the simulation experiments presented here, a fully three-dimensional bipedal robot with 
14 DOFs, shown in the leftmost panel of Fig. 2, was used. The simulated robot weighs 7 kg 
and its height is 0.75 m. The distance between the ground and the hips is 0.45 m. The waist 
has 2 DOFs, each hip joint 3 DOFs, the knee joints 1 DOF each, and the ankle joints 2 DOFs 
each, as illustrated in the in Fig. 2 (second panel from the left). 

The simulations were carried out using the EvoDyn simulation library (Pettersson, 2003), 
developed at Chalmers University of Technology. Implemented in object-oriented Pascal, 
EvoDyn runs both on Windows and Linux platforms and is capable of simulating tree- 
structured rigid-body systems. Its dynamics engine is based on a recursively formulated 
algorithm that scales linearly with the number of rigid bodies in the system (Featherstone, 
1987). For numerical integration of the state derivates of the simulated system, a fourth 
order Runge-Kutta method is used. Visualization is achieved using the OpenGL library. 




Fig. 2. The two panels to the left show the simulated robot and its kinematics structure with 
14 DOFs, used in this work. The panels on the right show the dimensions of the robotic 
body. 

In the simulation experiments, the output of the CPG network was either interpreted as 
motor torques, which were applied directly to the joint actuators, or as desired joint angles 
for the robot's joints to follow. In the latter case, a standard PD controller was implemented 
in order to generate the adequate torques to the joint actuators. 



4. Optimization of CPG networks 

This section presents the results from experiments conducted with the 14 DOF simulated 
robot introduced above. First, however, the setup of the experiments will be described in 
some detail, starting with a description of the fitness measure used in connection with the 
GA. 

The method for CPG network generation presented here is intended for use in connection 
with fully three-dimensional bipedal robots, with many DOFs. For such robots, designing a 
CPG network by hand is a daunting task. Instead, in this method, a GA is used for carrying 
out structural and parametric optimization of the CPG network, with the aim of achieving 
bipedal walking, without specific reference trajectories. The approach poses many 
challenges, particularly since no a priori knowledge of the needed control signals is 
available. In fact, in the experiments presented below, the CPG networks have been evolved 
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starting from randomly generated initial populations (the members of which are referred to 
as individuals, following standard GA terminology) of such networks. Thus, for experiments 
of this kind, the first challenge is to choose a suitable fitness measure that will favor CPG 
networks capable of executing an upright bipedal gait. 




Fig. 3. The posture-support structure used in this work for helping the robot to balance as it 
starts to walk. 

4.1 Fitness measure 

How should one judge the motion patterns generated in the beginning of the evolutionary 
process, given the fact that, in all likelihood, none of the individuals are able to walk at that 
stage? The simplest way, perhaps, would be to judge the individuals by their ability to move 
forward, i.e. to use, as the fitness measure, simply the covered distance in the initial forward 
direction, starting each evaluation with the simulated robot in an upright position. By thus 
favoring individuals that can carry out some form of movement in the right direction, one 
may assume that, sooner or later, individuals will appear that are able to walk indefinitely. 
This, however, is not what typically happens. Instead, it is more common that the 
evolutionary process quickly discovers a local optimum in the fitness space, such that the 
robot simply throws its body forward, thus receiving a relatively high fitness compared to 
robots that simply fall down at the starting position. Once this local optimum has been 
found, the evolutionary process often gets stuck in some sort of crawling motion, or even a 
tripod-like gait in which the torso is used as a third leg. Of course, such mediocre results 
can, in principle, be avoided by adding constraints on body posture as part of the fitness 
measure. However, finding relevant constraints is a difficult task, especially in fully three- 
dimensional simulations. Usually, a modification of the fitness measure simply leads to the 
discovery of other unnatural (and thus undesirable) gaits. 

In order to deal with this problem, an approach has been proposed that uses a supporting 
structure (shown in Fig. 3) added to the robot to help it maintain its balance as it starts to 
walk (Wolff et al., 2006). Introducing posture support enables those individuals that 
produce some sort of repetitive leg motion (but are initially unable to maintain balance) 
resulting in forward motion to gain high fitness. Another benefit of this approach is that the 
fitness measure can be left quite simple, e.g. the distance covered in the forward direction 
(even though some additional punishment must be introduced as well, as described in 
Section 4 below). On the other hand, the problem of when, and how, to remove the support 
must be considered instead. Several strategies for eliminating the support have therefore 
been investigated. Despite its appearance, it is modeled as a massless structure and therefore 
it does not affect the robot's dynamics, unless it starts to fall. 
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4.2 Structural and parametric optimization 

The elementary CPG network structure, shown in Fig. 4, has been designed in a way that 
each joint is assigned a half -center oscillator (depicted in Fig. 1). Thus, in this case, the 
structural optimization part consists of deciding the presence, or absence, of the connections 
between the CPGs for different joints. Furthermore, motivated by the fact that the 
movements of the left and right parts of a bipedal robot are symmetrical during walking, 
symmetry constraints on the CPG network were imposed. Thus, the structure of the CPGs 
on the right side of the robot mirrors that of the left side, considerably reducing the size of 
the search space for the GA. For example, the connection weight between the left hip and the 
left knee is equal in value to the weight connecting the right hip to the right knee. 



Waist n Waist 





Ankle Q n D p Ankle 
hoot Foot 

Fig. 4. Left panel: the structure of the CPG network used in the simulations, with the hipi 
joint CPG marked with a darker shade of gray. The connections between the individual 
oscillators are represented, in a slightly simplified way, by the arrows in the figure; see the 
text for a more detailed explanation. Right panel: The robot's kinematics structure with joint 
names indicated. 

The CPG network is further constrained such that only the hipi CPG, responsible for rotation 
in the sagittal plane, can be connected to all the other ipsilateral 1 joint CPGs, the 
corresponding contralateral hipi CPG, and the waist CPGs. Note, however, that the hipi CPG 
on a given side, can only receive connections from the corresponding contralateral hipi CPG, 
see Fig. 4 for more details. The remaining joint oscillators do not transmit any signals at all. 
The introduction of these constraints was motivated by several preliminary experiments, the 
results of which indicated that the evolutionary process was considerably less efficient when 
a fully connected CPG network was used, yet without improving the final results. 
For simplicity, in Fig. 4, the connections between CPG units are depicted with a single 
arrow. In fact, each such arrow represents a maximum of four unique connections between 
two joint oscillators, as indicated in Fig. 5. This figure depicts, as an example, the possible 
connections between the hipi CPG and the knee CPG. Thus, given the symmetry and 
connectivity constraints presented above, a total of 8x4 = 32 connections are to be 
determined by the GA. Note, that it is possible that the GA may disable all connections, Wi - 



1 The term ipsilateral refers to the same side of the body, and is thus the opposite of the term 

contralateral. 
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W4 between two CPG units. If this happens, one connection will be forced between the two 
CPGs. This is motivated by the fact that, without any interconnection, no synchronization 
between the two joints will be possible. In this setup, the forced connection will be added 
between the flexor neuron in the transmitting CPG, and the flexor neuron in the receiving 
CPG. There is, however, no particular reason for preferring this connection; Choosing any 
other connection would not make any fundamental difference. 
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Fig. 5. All possible connections between the hipi CPG and the knee CPG. 

For optimization of the CPG network, a standard genetic algorithm, (Holland, 1975), was 
chosen. In the GA, the CPG network of a given individual was represented by two 
chromosomes; one binary- valued chromosome determining the presence or absence of each 
of the 32 evolvable connections, and one real-valued chromosome determining the 
parameter values for those connections that are actually used in a given individual. In 
addition to these two chromosomes, a third (real-valued) chromosome was used for 
determining the sign and strength of the different feedback paths, as described in the next 
subsection. 

In early experiments, the internal parameters (see Eqs. (1) and (2)) of the individual half- 
center CPGs were also evolved by the GA. However, with the supporting structure present, 
evolution always promoted parameters values that produced unnaturally large steps, so 
that the support could be exploited for reaching higher fitness values. In such cases, the 
robot became highly dependent on the supporting structure for balance, making it even 
more difficult to find an appropriate removal strategy for the support. For this reasons, the 
internal parameters of the individual half -center CPGs were set to fixed values, generating a 
motion frequency of the robot's legs approximating that of normal walking. The chosen 
parameters are shown in Fig. 6 along with the resulting output from the half-center 
oscillator. These parameters were applied to all CPGs, except for the knee joint CPGs and the 
waisti CPG. In analogy with human gait, the knee joint oscillator and the waisti oscillator 
were tuned to generate a rhythmic pattern with double frequency compared to the other 
CPGs. Thus, for these joints' CPGs, the t u and i v values were set to half of the values used for 
the other CPGs. 

In each run, a population of 180 individuals was used. Selection of individuals (for 
reproduction), was carried out using tournament selection, with a tournament size of 8 
individuals. In a given tournament, the individual with the highest fitness value was 
selected with a probability of 0.75, and a random individual (among the 8) with 0.25 
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probability. After each selection step, the mutation operator was applied, randomly 
changing a gene's value with probability 10/N, where N denotes the total number of genes 
in the three chromosomes of the individual. 




Fig. 6. Left panel: The half-center oscillator assigned to each joint. Right panel: The 
parameter values used, along with a typical output signal. Note that the knee and waistj joint 
oscillators used different values of the time constants (see the main text for details). 

The fitness measure was taken as the distance walked by the robot in the initial forward 
direction, x, decreased by the sideways deviation y. The fitness function F, for a given 
individual i, can thus be expressed as: 

F(i) = x-\y\ (4) 

In addition to the fitness function, several indirect punishments were introduced during 
evaluation. For example, since the two-point supporting structure provided no support in 
the sagittal plane, the robot could still fall to the ground during evaluation, leading to the 
emergence of motion patterns such as crawling. Avoiding this was important, particularly 
in the beginning of the evolutionary process, since these individuals could gain better 
fitness than the very first individuals that walked with an upright body posture. Thus, 
even though the support was used, the resulting gaits could develop towards gait 
patterns other than upright walking. Thus, a rule was introduced such that if a robot's 
hips collided with the ground, the evaluation of that particular individual was 
terminated. Other punishments or modifications that were also introduced are described 
in the experiment section. 



4.2 Feedback signals 

In order to guide the evolutionary process towards an upright and stable bipedal gait, 
feedback was introduced by measuring the waist angle, thigh angle, and lower leg angle, all 
relative to the vertical axis. The introduction of feedback paths for the generation of bipedal 
locomotion has proven to be important in achieving adaptation to the environment and 
forming the overall gait pattern (Taga et al., 1991; Righetti & Ijspeert, 2006). In the 
experiments presented here, the structure of the feedback network was determined in 
advance. However, the actual type of the connection, i.e. whether it is inhibitory or 
excitatory, and the strength of the feedback paths were determined by the GA. Symmetry 
constraints were applied also in this case, meaning that the feedback structure of the right 
side of the robot mirrored that of the left side. Furthermore, motivated by biological findings 
indicating that tactile feedback from the foot is essential for human locomotion (Van Wezel 
et al., 1997), a touch sensor in each foot was introduced in the simulations. The signal from 
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the touch sensor was used both as tactile feedback, i.e. to indicate foot-to-ground contact, 

but also to enable, or prohibit, other feedback signals to be connected to certain CPGs 

during a specific phase of the gait; see Fig. 7 for an example. 

In this section, only some illustrative examples of the feedback network structure are given. 

For a detailed description of the feedback paths see (Wolff et al., 2006). The feedback signals 

to an individual CPG are introduced by adding an extra term to Eq. (1), which then 

becomes: 

N 

r>, = ~ u i - P v , + X a «yj + M o + / ( 5 ) 

where /denotes the sum of all feedback signals connected to the particular CPG. 
When designing the feedback network, the main idea was to trigger the different phases 
in a gait cycle. For example, for the left hipi CPG, the feedback structure was designed in 
the following way: The flexor and the extensor neuron receive two feedback signals, as 
shown in the left panel of Fig. 7: One signal from the right-foot touch sensor, and one 
signal measuring the inclination angles of the left and right upper leg, scaled by the 
strength of the feedback connections. The feedback signal from the touch sensor is 
intended to trigger the swing motion of the left leg, as soon as the right foot touches the 
ground. Feedback from the inclination angles of the upper legs is intended to promote 
repetitive leg motion in such a way that, when the left leg is moving forward, the right leg 
should move backwards, and vice versa. The feedback paths connected to the hipi CPG 
are depicted in the left panel of Fig. 7. Note, that the type of connections shown in the 
figure are chosen to easier demonstrate how the feedback signals were intended to affect 
the motion of the hipi joint. However, since the dynamical system of a single CPG unit 
becomes rather complex in an interaction with other CPGs, the robot, and the 
environment, it is very hard to predict if the chosen feedback configuration will work as 
intended. For this reason, both the sign and the strength of each connection were 
determined by the GA (but not the structure of the feedback paths). Fig. 7 shows an 
example in which the feedback connection to the flexor neuron of a certain CPG happened 
to be of opposite sign compared with the feedback connection to the extensor neuron of 
the same CPG. 

In the right panel of Fig. 7, the feedback paths connected to the right hip2 CPG are 
illustrated. In same way as above, the illustrated types of connections in the figure intend 
only to demonstrate how the structure of the feedback for this joint was planned to affect the 
motion of the hip2 joint. It does not represent the best connection configuration (in terms of 
the sign and the strength) chosen by the GA. In the case of hip2, only the feedback signal 
measuring the inclination angle of the right upper leg is connected to the oscillator. 
However, the feedback is only transmitted to the CPG unit if the touch sensor in the right 
foot is activated, i.e. when the right foot is on the ground. This configuration was chosen 
with the intention of ensuring that the hip2 CPG was able to generate more torque in the 
stance phase, i.e. when the right leg is moving backwards, during which the entire body of 
the robot is resting on the right leg. 

One should notice that by choosing an adequate feedback structure, the feedback signals 
may indirectly force synchronization between different joints, even if the connection weights 
between the joints' CPG units may not necessarily ensure this by themselves. In this work, a 
total of 20 parameters, determining the sign and the strength of the different feedback paths 
were evolved by the GA; see (Wolff et al., 2006) for details. 
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Fig. 7. The simulated robot with the hipi and hipi joint CPGs and the corresponding 
feedback, with specific connection types (i.e. signs) chosen to illustrate how the feedback 
was intended to affect the corresponding joint's motion (see the main text for details). The 
hipi flexor neuron rotates the leg in the counter-clockwise direction in the sagittal plane 2 , 
while the hifi flexor neuron rotates the leg in the clockwise direction in the frontal plane 
(seen from behind the robot). 



4.3 Experimental setup 

In this subsection, the setup used in the simulation experiments will be discussed. Two 
approaches for generating dynamically balanced bipedal locomotion will be described, the 
difference being the method for removing the support structure. 

In both approaches, a two-point supporting structure, as depicted in Fig. 3, was used. 
Initially, a four-point support was used, in an open-loop fashion without feedback signals. 
However, the GA exploited the support to such an extent that no useful results were 
obtained (Wolff et al., 2006). For example, the individuals obtained walked with unnaturally 
large steps and were unable to balance without constant use of the supporting structure. 
Thus, instead a two-point structure was introduced in order to minimize exploitation (Wolff 
et al., 2006), while still providing some support. The contact points of the support structure 
were placed 2 m from the robot and 0.25 m above the ground. This configuration was 
chosen to ensure small sideways leaning angle and at the same time allow the robot to bend 
its knees without the support touching the ground. 

Method 1: Evolution in two steps 

In this approach, the evolutionary process was divided into two steps. During the first step, 
the posture support was present and the hvpi, hips, and ankle joints were locked. In this step, 
the intention was to evolve a CPG network capable of producing a stable upright gait in the 
sagittal plane. Once a stable individual had been obtained, it was cloned, creating a new 
population consisting of copies of this individual. At this stage, the second step was 
initiated, in which the support was removed and the GA was assigned the task of finding a 



The sagittal plane is a vertical plane running from front to back through the body, while the 
frontal plane is a vertical plane running from side to side through the body, perpendicular to the 
sagittal plane 
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way to balance the robot in the frontal plane as well. Before the second step of evolution was 
started, the hvpi and ankle joints were unlocked, and the corresponding genes were set 
randomly for each individual in the population. Since the waist joints also affect the frontal 
plane balance, the corresponding genes were randomly initiated as well. The remaining 
genes (which take identical values for all individuals) ensured sagittal plane balance and 
were therefore left unchanged in the second step of evolution. In this approach the hips joint 
remained locked during the entire procedure. The fitness measure used in both steps was 
the distance walked by the robot in the initial forward direction, decreased by the sideways 
deviation, as formulated in Eq. (4). 

Since the two-point supporting structure provides no support in the sagittal plane, the 
evolutionary procedure found several easily accessible solutions for maintaining sagittal 
balance. In some cases, the evolved individuals used the torso as a third leg, or the knees, for 
maintaining balance. Other examples include forward motion using somersaults or walking 
with unnaturally large steps. In order to prevent the individuals from developing other 
means of locomotion than upright, human-like gaits, several constraints had to be 
introduced. In Method 1, the following rules and modifications were added: 

a) The contact points in the knee joints and the torso, used for detecting ground 
collisions, were removed. 

b) If the robot's hip collided with the ground, the evaluation run of that individual 
was aborted. 

c) If the robot's hips were located less than a 0.15 m above the ground, the supporting 
structure was immediately removed, and was not further used during the 
evaluation of the individual. 

The removal of the robot's contact points in the torso and the knees eliminated the 
possibility of misusing these body parts for support. Ending a run as soon as the robot's hip 
collided with the ground, efficiently removed all crawling solutions. Finally, punishing the 
individuals for having the hips too close to the ground successfully removed those 
individuals trying to take unnaturally large steps (for improved sagittal plane balance). If 
the step length is large and the support is removed in this way, the robot will most likely be 
unable to maintain the frontal plane balance. Thus it will fall to the ground, terminating its 
evaluation. Using these rules, the evolutionary procedure was strongly biased towards 
individuals walking with an upright posture. 

Initially, the posture support was intended to be present during the entire evaluation time of 
the first step, and then completely removed in the second step. However, it turned out that 
such an approach gave no obvious way of deciding at what point to interrupt the first step; 
This simply had to be judged by the experimenter in an ad hoc manner. Evolving for too 
long, for example, led to individuals that were indeed walking rapidly but, at the same time, 
over-exploiting the posture support, making the produced gait unsuitable for the second 
step, where the support was removed. For this reason, the time during which the support 
was present in the first step was changed from the entire evaluation period to only the first 
two seconds of the evaluation. This arrangement is motivated by the assumption that it is 
during the starting sequence, before entering the gait cycle, that the individuals are most 
vulnerable in terms of frontal plane balance. For this reason, in these experiments, the 
posture support was also present during the initial two seconds in the next step. The details 
regarding the experiments carried out using Method 1 are shown in Table 1. The results 
from the simulations are given in the next subsection. 
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Case 


Support 


t 


DOFs 


CPG genes 


Feedback genes 


1.1 


2 sec, 2-point, 0.25, 2 


40 


8 


40 


14 


1.2 


2 sec, 2-point, 0.25, 2 


40 


12 


32 


10 



Table 1. Details regarding the experiments made using Method 1. The first column indicates 
which step in Method 1 is being considered. In the column labeled support, the term 2 sec 
indicates that the (two-point) support was only present during the first two seconds of the 
evaluation time. The numbers i, j in the same column denote the initial placement of the 
contact points in a given run, where i is the height above the ground [m], and j is the 
horizontal distance from the hip [m]. The evaluation time (third column) is denoted by t [s]. 
The fourth column (DOFs) gives the number of active joints, and the last two columns 
indicates the number of genes evolved by the GA. 

Due to symmetry constraints, described in subsection 4.2, only those connections in the CPG 
network that correspond to one side of the robot, and the waist joints, must be evolved. Since 
there are four unique connections for each joint, a total of 8 genes per joint must be evolved, 
i.e. four genes determining the presence or absence of the connections, and four genes 
determining the parameters of the connections that are actually used. Thus, in the first step 
of Method 1, five unique joint configurations (hipi, knee, foot, waisti and waist2) must be 
evolved, leading to a total of 8x5=40 genes. Based on the feedback equations given in (Wolff 
et al., 2006) the number of genes needed to determine the characteristics of the feedback 
paths for these joints sums up to 14. 

In the second step of Method 2, the genes corresponding to the hipi, knee and the foot joints 
were not changed. Instead, the waisti, waist2, hipi, and the ankle joints were evolved, bringing 
the total number of CPG genes to 4x8=32. The number of genes needed for the feedback 
network sums up to 10 since the feedback paths connected to hipi, knee and the foot joints 
require four more parameters than the hipi, and the ankle joints (Wolff et al., 2006). 

Method 2: One-step evolution with 12 active DOFs 

In Method 1, the supporting structure was only allowed in the beginning of an evaluation 
run, motivated by the assumption that it is during the starting sequence that the individuals 
are most vulnerable in terms of balance. Since it was found that this assumption turned out 
to be correct, the somewhat cautious two-step approach used in Method 1 is perhaps 
unnecessary. Therefore, in Method 2, the evolution of CPG networks was carried out in one 
single step during which all joints except the hip?, joint were unlocked. Apart from this 
change, the additional rules, presented in connection with Method 1, were also valid here. 



Case 


Support 


t 


DOFs 


CPG genes 


Feedback genes 


Type of CPG 


2 


2 sec, 2-point, 0.25, 2 


40 


12 


56 


18 


torque 


3 


2 sec, 2-point, 0.25, 2 


40 


12 


56 


18 


angle 



Table 2. Details regarding the experiments made using Method 2. As in Method 1 the term 2 
sec in the column labeled Support indicates that the (two-point) support was only present 
during the first two seconds of the evaluation time. In the last column, the type of output 
produced by the CPG network is given. 

In addition, both a torque-generating CPG network and a network for generating joint 
angles were tested. In the latter case, the appropriate torques were generated using a 
standard PD controller, implemented as 
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T, = k(e r -e)- 



kA 



(6) 



where T; is the torque applied to joint i, k p is the proportional constant, kd is the derivative 
constant, Q, is the desired angle, and 0, is the current angle of the i th joint. In the experiment 
described in the 2 nd row of Table 2, k p and kj were chosen as 20.0 and 4.0, respectively. 
In both cases, the fitness measure was the distance walked by the robot in the initial forward 
direction, decreased by the sideways deviation, as formulated in Eq. (4). 



4.4 Results 

Method 1: 

In Method 1, the generation of upright bipedal locomotion was carried out in a two-step 
evolutionary procedure. In the first step, during which the Mp2, hif3 and ankle joints were 
locked, an individual walking with a speed of 0.58 m/s was evolved after 400 generations. 
The results for this individual are shown in Table 3, case 1.1. As Table 3 implies, the 
individual evolved here was fairly unstable and able to maintain balance only for one 
minute. Note that the stability in the frontal plane was only ensured by the waist joint, since 
the hif2 and ankle joints were locked. Even though slower, but stable, individuals were found 
earlier in evolution, i.e. individuals that maintained balance during the testing time of 20 
minutes, they were not suitable as a starting condition for the second step because of the 
way these individuals were walking: They were heavily leaning the torso forward, or 
backwards, keeping it motionless to create a sort of counterweight for balance. By contrast, 
the fastest individual maintained an active upright posture during the entire walking 
sequence. 



Case 


Method 


F 


V 


DOFs 


Resulting gait 


1.1 


1, stepl 


19.54 


0.56 


8 


slow, stable for 60 sec. 


1.2 


1, step2 


23.09 


0.58 


12 


slow, stable for 12 min. 


2 


2, torque 


35.56 


0.90 


12 


fast, stable for 42 sec. 


3 


2, angle 


52.46 


1.31 


12 


fast, stable for >20 min. 



Table 3. Results from the trials made using Method 1 and Method 2. In the column labeled 
Method, the method and the current step (or type of output) is given. F is the obtained fitness 
[m], and v denotes the average locomotion speed [m/s] of the robot during the evaluation 
period. DOFs is the number of active joints, and the last column gives a short description of 
the resulting gaits. 




Fig. 8. The best evolved gait pattern, using Method 1, with 12 active DOFs. The details of the 
corresponding individual are shown on the 2 nd row of Table 3. 

In the second step, the hipi and ankle joints were unlocked and the best individual, found 
within 100 generations, was able to walk with a speed of 0.58 m/s. However, when the 
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individual was tested for longer evaluation times, it fell after 12 minutes, see case 1.2 in 
Table 3. The resulting gait is depicted in Fig. 8. 

Method 2, torque-generating CPGs: 

While the CPG networks in Method 1 were evolved in two steps, with the hipi, hips and ankle 
joints locked at first, in Method 2, all joints, except the hip$ joint, were evolved in one step. 
When Method 2 was tested with a torque-generating CPG network, the main difference 
compared to Method 1 was the difficulty to evolve stable individuals. While the average 
speed of the best individual was significantly improved, the balance capabilities were at the 
same time reduced, and it was only after 800 generations that the best individual was able to 
balance during the entire evaluation time of 40 seconds. However, this result was not 
satisfactory since the individual did not walk more than 42 seconds when the evaluation 
time was expanded; see Table 3, case 2 for details. 

The resulting motion pattern resembled the one obtained in Method 1, shown in Fig. 8, with 
the exception of the foot joint, which was now more active during the lift-off phase. 
Nevertheless, the motion of the hif2 and ankle joints appeared rather nervous, suggesting 
that more restraining feedback to these joints' CPGs is necessary, preferably from their own 
joint angles, something that was not included in the current feedback network. 

Method 2, angle-generating CPGs: 

When Method 2 was tested in connection with angle-generating CPGs, the results were 
significantly improved, compared to the previous results, both in terms of locomotion speed 
and stability of the best individual, but also in terms of evolution time. The best individual 
found with this approach reached a distance of 52.46 m, having an average locomotion 
speed of 1.31 m/s (see Table 3, case 3). The corresponding individual, in the case of torque- 
generating CPGs, reached a distance of 35.56 m walking at an average locomotion speed of 
0.90 m/s. Furthermore, stable individuals capable of walking during the entire evaluation 
time of 40 seconds emerged quite early, around generation 25, compared to the case of 
torque-generating CPGs, where such results were obtained only after 800 generations. While 
the best individual using torque-generating CPGs was unable to maintain balance for more 
than 42 seconds, the best individual equipped with the angle-generating CPGs was able to 
walk without falling during a testing time of 20 minutes. The resulting gait is depicted in 
Fig. 9. 




Fig. 9. The best evolved gait pattern obtained using Method 2, with 12 active DOFs and with 
angle-generating CPGs. The details of the corresponding individual are shown on 4 th row of 
Table 3. 



4.5 Discussion 

The results of the study described above demonstrate the feasibility of using artificial 
evolution for designing CPG networks, as well as optimizing the parameters of such 
networks, in order to achieve efficient and stable bipedal walking for a simulated robot. 
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Both methods introduced here solved the problem of generating robust bipedal gaits. 
However, the first method showed some limitations in generating gaits with long-term 
stability. This drawback was a result of splitting up the evolutionary process into two parts. 
Since the Irnpi and ankle joints were locked during the first phase of Method 1, stability in the 
frontal plane was only ensured by the torso. As a result, the gaits had to be less dynamic in 
order for the robot to maintain balance in the frontal plane. Thus, these gaits were less 
suitable for generalization to full 3D, i.e. with all joints unlocked. Yet, the gaits evolved with 
this method were more stable than the solutions found in Method 2, when torque- 
generating CPGs were used in the latter method. The frontal plane balance seems to be the 
most critical point when trying to maintain an upright body posture. Thus, more feedback, 
based on the inclination of the body, should preferably be introduced to restrain the motion 
of some critical joints like the hif2 and the ankle. 

The best results were obtained with angle-generating CPGs. The major contribution to the 
improved results was the better motion control of the hip2 and ankle joints, which were now 
easily restrained using the PD controller. However, a problem with this approach is that the 
individuals start to take larger and larger steps, as evolution progresses. In order to prevent 
the occurrence of such large steps, small perturbations should perhaps be introduced during 
the evaluation of the individuals, preferably in the frontal plane. This should force the robot 
to walk in a more cautious manner. The need for the supporting structure during the initial 
seconds indicates that the CPG network handling the gait cannot fully handle also the start- 
up of the walking cycle. Thus, an extra controller, based on CPGs or other approaches, 
should be used for the start-up sequence. It should then be tuned to enter the walking cycle 
and hand over control to a CPG network in a smooth way. 

5. Current and future directions 

Throughout the CPG experiments presented here, the connection paths of the feedback 
network were pre-defined. Since the feedback paths were specified in an ad hoc manner, 
there is certainly no guarantee that these paths are optimal. Therefore, a topic for 
further work would be to investigate whether the structure of the feedback network 
could be improved by applying an optimization method. Furthermore, the constraints 
that were added to the CPG networks, e.g. the restricted number of connections, could 
also be removed in order to evolve gait patterns from fully connected CPG networks 
instead. 

In the case of bipedal robot control in a dynamically changing environment, more advanced 
control of posture and balance would be required. Thus, investigating how different kinds 
of feedback, such as information regarding foot joint angles and foot-to-ground contact, can 
influence the quality of the generated gaits is another relevant topic, particularly in view of 
the importance of tactile sensory feedback in the case of human bipedal walking (Ogihara & 
Yamazaki, 2001). 

As shown in Fig. 6 above, the internal parameters of each CPG unit were pre-defined as 
well, in order to prevent the optimization procedure from promoting parameter values 
that generate unnaturally large steps, in which case the support could be exploited for 
reaching higher fitness values. However, in Section 4.4, it was shown that the support was 
only necessary during the first seconds of each evaluation run (see Case 3, Method 2). 
Thus, with the removal of the support after a few seconds of walking, the possibility of 
exploiting it was strongly reduced, making evolution of individual CPG parameters 
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possible. Results from preliminary experiments of that kind indicate similar performance 
of the evolved gaits (in terms of walking speed and stability) as that obtained in Case 3 of 
Method 2. However, an interesting difference in the evolution of the gaits was observed. 
In the experiments described in Section 4.4, the first stable gaits that emerged had 
basically the same walking pattern as the final gait evolved in the same run. The major 
contribution to the fitness increase was the increase in walking speed, rather than step 
length. However, in the approaches where the individual CPG parameters are also 
evolved, the first stable gaits that emerge have considerably smaller step length than the 
final gait evolved in that run. 

Another topic that requires further development concerns gait transition from, say, a 
walking gait to a running gait. Two preliminary experiments have recently been carried out. 
In the first case, a stop-and-go routine was accomplished by manually changing the bias 
values and time constants appropriately. In the second case, a complete and smooth gait 
transition was realized by abruptly switching control from one CPG network to another. 
However, those maneuvers required quite some manual tuning of the CPG parameters in 
order to work fully. In a robot intended for real-world applications, the transitions must be 
carried out in a more automatic fashion. 

Ultimately, the CPG-based controllers should, of course, be implemented in real, physical 
robots. Preliminary experiments with a small humanoid robot with 17 DOFs have recently 
been carried out (Wolff et al., 2007). In these experiments the robot's gait was optimized 
using an evolutionary approach, including structural modifications of the gait control 
program. 
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1. Introduction 

In recent years, robots demonstrating an aping or imitating function have been proposed. 
Such functions can estimate the actions of a human being by using a non-contact method 
and reproduce the same actions. However, very few systems can imitate the behaviour of 
the hand or the fingers (Bernardin et al., 2005). On the other hand, reports in Neuroscience 
state that mirror neurons (Rizzolatti et al., 1996; Gallese et al., 1996), which participate in the 
actions imitated by chimpanzees, are activated only for actions of the hand and fingers, 
such as cracking peanut shells, placing a sheet of paper over an object, tearing the sheet into 
pieces, etc.. Moreover, since human beings perform intelligent actions using their hands, it is 
important to attempt to artificially imitate the actions of hands and fingers in order to 
understand the dexterity and intelligence of human hands from an engineering viewpoint. 
The object actions in the case where one "looks at an action performed by others and 
imitates it" include not only grasping or manipulating objects but also the actions involving 
"imitating shapes and postures of hands and fingers" of others such as sign language and 
dancing motions. The latter two types of actions are often more complicated and require 
dexterity as compared to the former actions. 

In the action of imitating "the shape of hands and fingers" such as sign language, it is 
essential to estimate the shape of the hands and fingers. Furthermore, as compared to the 
imitation of the actions of the lower limbs, estimating the shape of the hands is significantly 
more important and difficult. In particular, human hands, which have a multi-joint structure, 
change their shapes in a complicated manner and often perform actions with self-occlusion 
in which the portion of one's own body renders other portions invisible. In the case of an 
artificial system, we can utilize a multi-camera system that records a human hand for 
imitative behaviours by surrounding it with a large number of cameras. However, all the 
animals that mimic the motions of others have only two eyes. To realize the reproduction of 
the actions of hands and fingers by imitating their behaviour, it is desirable to adopt a 
single-eye or double-eye system construction. 

To roughly classify conventional hand posture estimation systems, the following two types 
of approaches can be used. The first approach is a 3D-model-based approach (Rehg & 
Kanade, 1994; Kameda & Minoh, 1996; Lu et al., 2003) that consists of extracting the local 
characteristics, or silhouette, in an image recorded using a camera and fitting a 3D hand 
model, which has been constructed in advance in a computer, to it. The second approach is a 
2D-appearance-based approach (Athitos & Scarloff, 2002; Hoshino & Tanimoto, 2005) that 
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consists of directly comparing the input image with the appearance of the image stored in a 
database. The former is capable of high-accuracy estimations of the shape, but it is weak 
against self-occlusion and also requires a long processing time. The latter can reduce the 
computation time; however, if 3D changes in the appearance of hands are not an issue, 
which also include the motions of the wrist and the forearm, a large-scale reference database 
is required, and it becomes difficult to control the robot hand by means of imitation. 
However, if the fundamental difficulty in the estimation of the hand posture lies in the 
complexity of the hand shape and self-occlusion, the high-accuracy estimation of the shape 
will become theoretically possible; this requires the preparation of an extensive database 
that includes hand images of all the possible appearances with complexity and self- 
occlusion. The feasibility of this approach depends on the search algorithm used for rapidly 
finding similar images from an extensive database. 

Therefore, in this study, we aim at realizing a system for estimating the human hand shape 
capable of reproducing actions that are the same as those of human hands and fingers to a 
level reproducible using a robot hand at high speeds and with high accuracy. With regard to 
the processing speed, we aimed to achieve short-time processing of a level enabling the 
proportional derivative (PD) control of a robot. With regard to the estimation accuracy, we 
aimed at achieving a level of estimation error considered almost equivalent to that by the 
visual inspection of a human being, namely, limiting the estimation error in the joint angle 
to within several degrees. In addition to those two factors, dispersion in the estimation time 
due to the actions of human hands and fingers causes difficulties in the control of a robot. 
Therefore, we also aimed to achieve uniformity in the estimation time. In particular, in order 
to conduct high-speed searches of similar data, we constructed a large-scale database using 
simple techniques and divided it into an approximately uniform number of classes and data 
by adopting the multistage self-organizing map (SOM) process (Kohonen, 1988), including 
self-multiplication and self-extinction, and realized the high-speed and high-accuracy 
estimation of the shape of a finger within a uniform processing time. We finally integrated 
the hand posture estimation system (Hoshino & Tanimoto, 2005; 2006) with the humanoid 
robot hand (Hoshino & Kawabuchi, 2005; 2006) designed by our research group; our final 
system is referred to as the "copycat hand". 

2. System construction 

2.1 Construction of a large-scale database 

First, we conducted measurements of the articular angle data (hereafter referred to as the 
"key angle data"). In the present study, we determined the articular angle at 22 points per 
hand in the form of a Euler angle by using a data glove (Cyberglove, Virtual Technologies). 
For the articular data in the shape of a hand that requires a high detecting accuracy, 
especially in the search for similar images, a large number of measurements were made. 
Furthermore, we generated CG images of the hand on the basis of the key articular angle 
data. The CG editing software Poser 5 (Curious Labs Incorporated) was used for the 
generation of images. 

Second, from the two key angle data, we interpolated a plurality of the articular angle data 
in optional proportions. The interpolation of the articular data is linear. Moreover, we also 
generated the corresponding CG images of the hand on the basis of the interpolated data. 
This operation enables an experimenter equipped with the data glove to obtain CG images 
of the hand in various shapes with the desired fineness without having to measure all the 
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shapes of the hand. Fig.l shows a schematic chart of the interpolation of the articular angle 
data and the CG images of the hand. Furthermore, Fig.2 shows an example of the 
interpolated CG images of the hand. This figure represents an example of a case where the 
articular angle was measured at three different points in time for the actions of changing 
from 'rock' to 'scissors' in the rock-paper-scissors game, and the direct generation of CG and 
the generation of CG using interpolation were made from two adjoining data. In both these 
figures, the three images surrounded by a square represent the former, while the other 
images represent the latter. 
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Fig. 1. Interpolation of the articular angle data and CG images of the hand. 
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Fig. 2. Examples of the interpolated CG images of the hand. 

Third, we added the data describing the differences among individuals. Because of the 
differences that exist among individuals (as shown in Fig. 3), a wide variety of data is 
required for a database intended for searching similar images. For example, in the hand 
shape representing 'rock' in the rock-paper-scissors game, a significant difference 
among individuals is likely to appear in (1) the curvature of the coxa position of the 
four fingers other than the thumb and (2) the manner of protrusion of the thumb coxa. 
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Moreover, differences are likely to appear in (3) the manner of opening of the index and 
the middle finger and (4) the standing angle of the reference finger in the 'scissors' 
shape, and also in (5) the manner of opening and (6) the manner of warping, etc. of the 
thumb in the 'paper' shape. In order to express such differences among individuals in 
the form of the CG hand, we need to adjust the parameters of the length of the finger 
bone and the movable articular angle; therefore, we generated the CG images of hands 
having differences among individuals on the basis of the articular angle data obtained 
by the procedure described above. Fig.4 indicates an example of the additional 
generation of the CG hand in different shapes. In the figure, the X axis shows CG hands 
arranged in the order starting from those with larger projections of the thumb coxa, 
while the Y axis represents those with larger curvature formed by the coxa of the four 
fingers other than the thumb, respectively. 
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Fig. 3. Examples of the differences among individuals. 

By performing the first to third steps mentioned above, we generated a total of 15,000 CG 
hand images using this system. 

Then, the resolution was changed. Although the CG image generated this time had a 
resolution of 320 x 240 pixels, a substantial calculation time is required in order to estimate 
the posture and for applying various image processing techniques. In the present study, a 
reduced resolution of 64 x 64 was used. The pixel value after the resolution was changed is 
given by the following expression: 

gr(i,j) = - Y Y go(i * 320 / 64 + k,j * 320/64 + /) (!) 
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Fig. 4. Examples of the supplemented data of the differences among individuals. 

Here, gr(i,j) and go(i,j) are the pixel values at row i and column j after and before altering the 
resolution, respectively. Here, the calculation has also been vertically conducted with 320 
pixels in order to match the aspect ratio since the pixel resolution was altered to 64 x 64. 
Furthermore, k and I correspond to the row and column, respectively, within the respective 
regions before changing the resolution, and r = kxl. 

Finally, the contour was extracted. Differences exist in the environmental light, colour of 
human skin, etc. in the input images. The abovementioned factors were eliminated by 
extracting the contour in order to fix the width and the edge values, and the estimation 
errors were reduced by reducing the difference between the hand images in the database 
and in the input data. 



2.2 Characterization 

In the present study, we used the higher-order local autocorrelational function (Otsu & 
Kurita, 1998). The characteristics defined using the following expression were calculated 
with respect to the reference point and its vicinity: 



1 {a l ,a 1 ,--,a N )= jf(r)f(r + «,)■■■ f(r + a N )dr 



(2) 



Here, x N is the correlational function in the vicinity of the point r in dimension N. Since the 
pixels around the object point are important when a recorded image is generally used as the 
processing object, the factor N was limited up to the second order in the present study. 
When excluding the equivalent terms due to parallel translation, x N is possibly expressed 
using 25 types of characteristic quantities, as shown in Fig. 5. However, patterns Ml through 
M5 should be normalized since they have a smaller scale than the characteristic quantities of 
patterns M6 and thereafter. By further multiplying the pixel values of the reference point for 
patterns M2 through M5 and by multiplying the square of the pixel value of the reference 
point for pattern Ml, a good agreement with the other characteristic quantities was obtained. 
In the present study, an image was divided into 64 sections in total -8x8 each in the 
vertical and lateral directions - and the respective divided images were represented by 25 
types of characteristic quantities using the higher-order local autocorrelational function. 
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Therefore, a single image is described using the characteristic quantities of 25 patterns x 64 
divided sections. The image characteristics of the CG hand and the joint angle data were 
paired as a set for preparing the database. 
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Fig. 5. Patterns of the higher-order local autocorrelational function. 

2.3 Self-organization of the database 

If the database prepared in the preceding sections is directly used for searching, it increases 
the search time together with a larger database. Hence, we intend to narrow the search 
space by clustering data with similar characteristics in the database. For example, sorting by 
using a dichotomizing search may be feasible for ordinary data; however, in the case where 
the characteristics range over multiple dimensions, a limitation is that the number of 
searches during a retrieval becomes the same as that in the total search. Therefore, we 
constructed a database using Kohonen's SOM (Kohonen, 1988). 

Each database entry has a joint angle and a number of image characteristics; however, only 
the image characteristics are used in the search during estimation. There is a possibility that 
there exist data that have similar characteristics but significantly different joint angles; such 
data may be included in the same class if the classification is made on the basis of the 
characteristics during the self-organization of the database. On the other hand, there also 
exist data having significantly different characteristics, although the joint angles are similar. 
Therefore, we performed self-organization for both these types of data and conducted 
preliminary experiments; the obtained results are listed in Table 1. The mean value of the 
errors and the standard deviation are the values for the middle finger. The data for the other 
fingers are omitted from the table since they exhibited similar tendencies. Degree is used as 
the unit of the mean value of the errors and the standard deviation. As shown in the table, 
the case of self -organization on the basis of characteristics yielded better results. 
Consequently, we performed data clustering using self-organization on the basis of 
characteristics in the present study. 





processing time [ms] 


mean error [degree] 


standard deviation 


joint angle 


0.842 


0.792 


6.576 


characteristics 


0.764 


0.373 


5.565 



Table 1. Performance of self-organization on the basis of joint angles and characteristics in 
the preliminary experiment. 
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First, we prepared classes having the representative angle, representative number of 
characteristics and neighbourhood class information as classes in the initial period. For the 
initial angles and the number of characteristics, random numbers in the range of to 1 were 
used. With regard to the neighbourhood class information, we calculated the distance 
between classes in the angles by using the Euclidean distance and determined classes close 
to one another in this distance as neighbouring classes; this information was retained as the 
class number. Although the number of neighbouring classes depends on the scale of the 
database and the processing performance of the PC, we studied it heuristically in this 
experiment, and determined classes up to that close to the eighth as the neighbour classes. 
Next, we calculated the distance in the characteristics between the data and the classes and 
selected the closest class by using the data in a secondary database. This class will hereafter 
be referred to as the closest neighbour class. Moreover, the used date will be considered as 
those belonging to the closest neighbour class. The representative angle and representative 
number of characteristics of the closest neighbour class were renewed by using the 
expression below so that they may be placed closer to the data. 

CA^CAy-aiCAij-DArj) 

CF^CFtj-aifiFij-DFrj) 

where CAjj denotes the representative angle j of class i; DA r p the angle j of data r; CFjj, the 
representative number of characteristics j of class i; DF r j, the representative number of 
characteristics j of data r; and a, the coefficient of learning. 

In this experiment, a was heuristically determined as 0.0001. Next, a similar renewal was 
also made in the classes included in the neighbour class information of the closest neighbour 
class. However, their coefficient of learning was set to a value lower than that of the closest 
neighbour class. In the present study, it was heuristically selected as 0.01. This was applied 
to all the data in the primary database. In order to perform self-organization, the 
abovementioned operation was repeated until there was almost no change in the 
representative angle and the representative number of characteristics of the class. 
Narrowing and acceleration of the search process can be realized to some extent, even if the 
database is used without self-organization. However, if such a database is used, dispersion 
is observed in the amount of data included in each class, thereby inducing dispersion in the 
processing time. Therefore, we intended to avoid the lack of uniformity in the processing 
time by introducing an algorithm for self-multiplication and self-extinction during self- 
organization. After selecting the class of adherence for all the data, we duplicate the classes 
that contain an amount of data exceeding 1.5 times the ideal amount. In addition, we deleted 
the classes containing an amount of data no more than one-half the ideal amount of data. 
Therefore, the amount of data belonging to each class was maintained within a certain range 
without significant dispersion, and the processing time was maintained within a certain 
limit, irrespective of the data in the class that was used for searching during the estimation. 
In case the algorithm for self-multiplication and self-extinction is introduced, a change is 
produced in the relationships among the classes, which remains unchanged in ordinary self- 
organization, making it necessary to redefine the relationships among the classes. Therefore, 
we newly prepared the neighbour class information by a method similar to that used during 
initialization in which we duplicated and deleted the classes. 

Estimations made by using a database obtained in this manner can considerably increase the 
search speed as compared to the complete search of ordinary data. However, considering 
further increases in the database and acceleration of the searches, the database clustering was 



312 



Humanoid Robots, New Developments 



performed not only in a single layer but also in multiple layers. Fig. 6 shows the schematic 
structure of the multiple layers. The class obtained with the aforementioned processing is 
defined as the second-layer class and is considered as data. A third-layer class is prepared by 
clustering the second-layer classes as the data. The third-layer class is prepared by following 
the same procedure as that used in the preparation of the second-layer class. Further, a fourth- 
layer class is prepared by clustering the third-layer classes. The lesser the amount of data in 
one class (or the number of classes in the lower layers), the higher the layer in which clustering 
can be performed. However, to absorb the dispersion of data, etc., it is preferable to prepare 
classes having an amount of data with a certain volume. Table 2 lists the results of the 
preliminary experiment in which clustering was performed by setting the amount of data in a 
class at 5, 10 and 20. Although the search time is reduced if the clustering is performed with a 
small amount of data, the estimation accuracy also reduces accordingly; therefore, we set an 
ideal amount of data as 10 in the present study as a trade-off between the two parameters. 
The clustered database obtained using the abovementioned operation was termed as a 
tertiary database. This tertiary database will hereafter be simply referred to as the database. 
In this system, we finally constructed a database comprising 5, 10 and 10 classes in order 
from the upper layers, where each class has approximately 10 data items. 




Fig. 6. Schematic structure of a database with multiple layers. 





processing time [ms] 


mean error [degree] 


standard deviation 


5 


0.656 


-0.035 


5.868 


10 


0.764 


0.373 


5.565 


20 


1.086 


0.145 


5.400 



Table 2. Performance according to the amount of data in a class in the preliminary 
experiment. 



2.4 Search of similar images 

During estimation, sequential images were acquired using a high-speed camera. In a manner 
similar to the preparation of the database, image processing techniques were applied to these 
images to obtain their characteristic quantities. By comparing each quantity with that in the 
database by means of a processing technique described later, the joint angle information that 
formed a pair with the most similar image were defined as each result was estimated. 
To estimate the similarity at the first search, the distance was calculated by using the 
characteristic quantity for all classes in the database. The calculation was performed by 
simply using the Euclidean distance that is derived using the expression below: 
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(4) 



; = 1 



Here, both X-n and xh are characteristic quantities i with the higher-order local autocorrelational 
functions of the class r and at the time t, respectively. The class that rninirnizes E,- was selected as 
the most vicinal class at time t. With respect to the affiliated data of the most vicinal class and all 
the vicinal classes of the most vicinal class, the distances from the characteristic quantities 
obtained from the image were calculated using expression (4). At each instance, the angle of the 
data with the shortest distance was regarded as the estimated angle. From the second search, the 
distance was not calculated by using the characteristic quantity for all the classes in the database. 
Instead, only the vicinal classes of the most vicinal class and the affiliated data were selected as 
the candidates for the search according to the histories at f-1, as shown in Fig.7. 

(a) at first search: all classes are candidates for the search. 




(b) from second search, the vicinal classes of the most vicinal class are candidates. 




(c) if the result moves to and affiliates with another class, 




(d) then, the search space and candidate classes moves. 




Fig. 7. Differences in the search spaces between the first search and the succeeding searches. 
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3. Experiment of posture estimation 

3.1 Methods and procedures 

In order to verify the effectiveness of this system, the actual images were subjected to 
experimental estimation. A subject held up a hand at a position approximately 1 m in front 
of the high-speed camera and moved the fingers freely provided the palm faced the camera. 
A slight motion of the hand was allowed in all the directions provided the hand was within 
the field angle of the camera. We employed a PC (CPU: Pentium 4, 2.8 GHz; main memory: 
512 MB) and a monochromatic high-speed camera (ES-310/T manufactured by MEGAPLUS 
Inc.) in the experiments. 



3.2 Results and discussions 

Fig.8 shows the examples of the estimation. Each estimated result plotted using the 
wireframe model was superimposed on the actual image of a hand. It is evident that the 
finger angles have possibly been estimated with a high precision when the hand and fingers 
were continuously moved. It was verified that the estimation could be performed, provided 
the hand image did not blend into the background, even if the illuminating environment 
was changed. 






ILj 



Fig. 8. Captured hand images and the results of the hand posture estimation. 

For the purpose of a quantitative assessment of the system, the measured and estimated 
values have to be compared. However, in an ordinary environment using this system, it is 
impossible to acquire the measured values of the joint angle information from the human 
hand and fingers moving in front of the camera. Consequently, we performed the 
estimation experiment by wearing the data glove and a white glove above it. The results 
are shown in Fig. 9, which reveals the angular data measured using the data glove and the 
estimated results. Fig. 9(a) shows the interphalangeal (IP) joint of the thumb; Fig. 9(b), the 
abduction between the middle and ring fingers; and Fig. 9(c), the proximal interphalangeal 
(PIP) joint of the middle finger. The state where the joint is unfolded was set as 180 
degrees. The system at this time operates at more than 150 fps and thus enables realtime 
estimation. 
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Fig. 9. Examples of the joint angle data measured using the data glove and the estimated 
results. 



As evident from the figure, the standard deviation of the errors in the estimated angles was 
4.51 degrees when we avoided the fluorescent light and used the angular data obtained by 
means of the data glove as the actual values; the results obtained did not have highly precise 
numerical values. We observed a trend of poor estimations, particularly for parts with little 
variation in the image (for example, the shape of the rock in the rock-paper-scissors game) 
against the angular variation. This may be expected, considering that a human is performing 
the figure estimation. In other words, we can hardly observe any difference visually for an 
angular difference of 10 degrees when each finger has a difference of 10 degrees. Therefore, 
the errors in this system, which conducts estimation on the basis of the camera image, may 
be considered as being within the allowable range. On the contrary, it can be observed from 
this figure that highly precise estimations are made in the region where visual differences 
are observed, namely, where the image changes significantly with the angular variations 
and where it is located in between the flexion and the extension. 
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Next, the comparative experiments were conducted. The difference between the previous 
experiment and these comparative experiments is that the hand position agrees with or 
closely resembles the database image since the object for estimation is set by selecting the 
CG hand image from the database. Consequently, we can determine the expected 
improvement in the estimating precision when the processing for positioning the input 
image is integrated into this system. The standard deviation of the errors when estimating 
the object was set to 2.86 degrees by selecting the CG image from the database, thus 
allowing very high-precision estimation. It is expected that the estimation error can be 
reduced to this extent in the future by integrating the processing for correcting the position 
into this system. Moreover, the processing time for the search, except for the image 
processing, is 0.69 ms per image. From the viewpoint of precision and processing speed, the 
effectiveness of the multi-step search using the self -organized database has been proved. 
As mentioned above, the estimation error for unknown input images had a standard 
deviation of 4.51 degrees. Since this is an image processing system, small variations in the 
finger joints in the rock state of the rock-paper-scissors game will definitely exhibit a 
minimal difference in the appearance; these differences will numerically appear as a large 
error in the estimation. However, this error possibly contains calibration errors arising from 
the use of the data glove, as well as the errors caused by slight differences in the thickness, 
colour, or texture of the data glove covered with the white glove. Therefore, the output of 
the data glove or the actual value of the quantitative assessment requires calibration 
between the strain gauge output and the finger joint value whenever the glove is worn since 
the joint angle is calculated from a strain gauge worn on the glove. No such calibration 
standards exist, particularly for the state in which the finger is extended; therefore, the 
measured angle can be easily different from the indicated value. Even when the estimation 
is newly calibrated, it is possible that the state of calibration may be different in each 
experiment. On the other hand, it is not necessary to apply calibration to the second 
experiment that selects the CG hand image from the database. It is highly possible that this 
influences the standard deviation value of 4.51 degrees; therefore, it is possible to consider 
that the standard deviation of the errors lies between 4.51 and 2.86 degrees even if the 
system has not been subjected to corrective processing for the hand position. 
The scheme of the present study allows you to add new data even without understanding 
the system. Another advantage is that the addition of new data does not require a long time 
since it is unnecessary to reorganize the database even when several new data items are 
added; this is because the database can sequentially self-organize itself by using the 
algorithm for self -multiplication and self-extinction of database classes. Furthermore, it is 
possible to search the neighbouring classes having angular similarities since each class 
possesses information about the vicinal classes in this system. This fact can also be regarded 
as the best fit for estimating the posture of a physical object that causes successive temporal 
angular variations, such as estimating the posture of the human hand. We attempted to 
carry out the hand posture estimation when the hand is rotated, although the number of 
trials was inadequate. Fig.10 shows an example of the result, which suggests that our system 
functions when a subject is in front of the camera and is rotating his/her hand. A subject can 
also swing the forearm, and our system can effectively estimate the shape of the fingers, as 
shown in Fig.ll. 

The image information and the joint angle information are paired in the database in our 
system. Once we output the results of the hand posture estimation to a robot hand, the robot 
can reproduce the same motions as those of the fingers of a human being and mimic them. 
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Fig.12 shows a dexterous robot hand (Hoshino & Kawabuchi, 2005) imitating the human 
hand motions without any sensors attached to it. We refer to this integrated system as the 
"copycat hand". This system can generate imitative behaviours of the hand because the 
hand posture estimation system performs calculations at high speeds and with high 
accuracy. 




Fig. 10. An example of hand posture estimation using the rotating motion of the wrist. 
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Fig. 11. Examples of the estimation when a subject is swinging his/her forearm. 




Fig. 12. The copycat hand can ape and imitate human hand motions at high speeds and with 
high accuracy. 



4. Conclusion 

To realize a robot hand capable of instantly imitating human actions, high speed, high 
accuracy and uniform processing time in the hand posture estimation are essential. 
Therefore, in the present study, we have developed a method that enables the searching of 
similar images at high speeds and with high accuracy and the search involves uniform 
processing time, even in the case where a large-scale database is used. This is achieved by 
(1) clustering databases having approximately uniform amounts of data using self- 
organization, including self-multiplication and self-extinction and (2) by collating the input 
images with the data in the database by means of the low-order image characteristics, while 
narrowing the search space in accordance with the past history. 

In the preliminary construction of the database, we generated CG images of the hand by 
measuring the joint angles using a data glove and interpolating them; furthermore, we 
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extracted the contours, image characteristics and the characteristics that change only in the 
hand shape, irrespective of the environmental light or skin colour. The image was divided 
into several images and was converted into a number of characteristics by using the high- 
order local autocorrelation function; the image was then saved in the database in a state 
paired with the joint angle data obtained from a data glove. By clustering this database 
using self-organization depending on the number of characteristics and by the self- 
organization of classes in multiple stages, a multistage search was enabled using the 
representative numbers of classes in several layers. Moreover, by incorporating self- 
multiplication and self-extinction algorithms, we achieved a unification of the amount of 
data belonging to each class as well as the number of classes in the lower layers to avoid the 
dispersion of the search time in the classes. 

The input image at the time of an actual estimation of the hand finger shape was subjected 
to various types of image processing techniques in the same manner as that at the time of 
construction of the database, and it was converted into a number of characteristics. The 
distance from the number of characteristics obtained from the picture was calculated by 
using a representative number of characteristics. Classes at close distances were selected as 
candidate classes for the estimated angle, and a similar distance calculation was also 
performed in the classes in each layer belonging to a candidate class for the estimated angle. 
Among the respective data belonging to the candidate classes for the estimated angle in the 
lowest class, the angle data of the data with the closest distance between the number of 
characteristics was considered as the estimation result. Furthermore, for the selection of a 
candidate class, we attempted to reduce the search space by using the previous estimation 
results and the neighbour information. 

By estimating the sequential images of the finger shape by using this method, we 
successfully realized a process involving a joint angle estimation error within several 
degrees, a processing time of 150 - 160 fps, and an operating time without dispersion by 
using a PC having a CPU clock frequency of 2.8 GHz and a memory capacity of 512 MB. 
Since the image information and the joint angle information are paired in the database, the 
system could reproduce the same actions as those of the fingers of a human being by means 
of a robot without any time delay by outputting the estimation results to the robot hand. 
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Appendix: Composition of the humanoid robot hand 

(Hoshino & Kawabuchi, 2005; 2006) 

As compared to walking, the degree of freedom (DOF) assigned to manipulation functions 
and to fingers is extremely low. The functions of the hands are mostly limited to grasping 
and holding an object and pushing a lever up and down. The robot hand itself would tend 
to become larger and heavier and it would be almost impossible to design a slender and 
light-weight robot if currently available motors and reduction gears are used with a number 
of DOFs equivalent to that of the human hand. It is important to determine where and how 
to implement the minimum number of DOFs in a robot hand. 

We have designed the first prototype of a dexterous robot hand. The length from the 
fingertip to the wrist is approximately 185 mm and the mass of the device is 430 g, which 
includes mechanical elements such as motors with encoders and reduction gears without 
electrical instrumentation such as motor control amplifiers, additive sensors, or cables for 
external connection. 

Fig.13 shows two examples of generating movements involved in Japanese sign language. In 
the case of the numeral 2, the index finger and the middle finger should be stretched during 
abduction and pass through a clearance generated by the thumb. Generating the numeral 30 
involves a difficulty. A ring is formed by the thumb and the fourth finger and the other 
three fingers are stretched while exhibiting abduction and then bent to a suitable angle. As 
for the two examples generated by this system, movements were carried out promptly while 
maintaining an appropriate accuracy in order to facilitate a reasonable judgment of the 
numerals created by using the sign language. The time duration of the movement is slightly 
over 1 s for the numeral 2 and approximately 2 s for the numeral 30. 

An important function for the robot hand is picking up small, thin, or fragile items using 
only the fingertips. This capability is equally or even more important than the ability to 
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securely grasp a heavy object. Therefore, we designed a second prototype focusing on the 
terminal joint of the fingers and the structure of the thumb. 



****)!» 




* 




Fig. 13. Examples of the sign language movements. 

As an independent DOF, we implemented a small motor at every fingertip joint, namely at 
distal interphalangeal (DIP) joints of four fingers and interphalangeal (IP) joint of the thumb. 
The mass of the motor is approximately 10 g with a gear. Although the maximum motor 
torque is very small (0.5 Niran), the maximum fingertip force is 2 N because of the high- 
speed reduction ratio and the short distance between the joint and the fingertip, which 
provides sufficient force for picking up an object. Moreover, it has a wide movable range. 
Each fingertip joint can bend inward as well as outward, which, for instance, enables the 
robot hand to stably pick up a business card on a desk. 

We also added a twisting mechanism to the thumb. When the tips of the thumb and fingers 
touch, the contact is at the fingertip and the thumb pads; however, this may not provide a 
sufficient contact with the other fingertip pads since the thumb cannot twist to make this 
contact. The human hand has soft skin and padding at the fingertips and the high control of 
motion and force at the fingertips enables stable pinching even if the finger pads are not in 
complete mutual contact. However, we expect that the fingertip force produced by the 
terminal joint drive at the tip of the two finger groups will act in opposite directions at the 
same point, implying that the two fingertips will oppose each other exactly at the pad. 
Fig.14 shows the snapshots of the performance of the second type of robot hand, which 
repeated the series of movements and stably pinched the business card. The mass of the 
hand is approximately 500 g and the length from the fingertip to the wrist is approximately 
185 mm, which are almost equivalent to those of the human hand. 




Fig. 14. Snapshots of the robot hand handling a business card using two or three fingers. 
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1. Introduction 

Recently, a lot of researches which aim at realization of dynamic biped walking are being 
performed. There is Honda's ASIMO as a representative of researches of humanoid robots. 
ASIMO has joints of many degrees of freedom that are near to a human being, high 
environment adaptability and robustness, and can do various performances. However, it 
needs many sensors, complicated control, and walks with bending a knee joint to keep the 
position of a centre of gravity constant. Therefore, it walks unnaturally and consumes much 
energy. 

On the other hand, McGeer performed the research of passive dynamic walking from the 
aspect of that it is natural motion in a gravitational field (McGeer, T., 1990). This robot 
which could go down incline only using potential energy was developed and realized the 
energy-efficient walking. However, it needs incline, and its applied range is small because 
it has no actuator. Therefore, the researches that aimed at energy-efficient biped walking 
on level ground have been performed. S.H.Collins exploited the robot which had 
actuators at only ankles (Collins, S. H. & Ruina A., 2005). M.Wisse exploited the robot 
which used pneumatic actuators (Wisse, M. & Frankenhuyzen, J. van, 2003). Ono 
exploited the self-excitation drive type robot which had an actuator only at hip joint (Ono, 
K. et al, 2001); (Ono, K. et al, 2004); (Kaneko, Y. & Ono K„ 2006). And then, Osuka and 
Asano performed the level ground walking from a point of view to mechanical energy for 
joints which is the same with the energy consumed of passiveness walk (Osuka, K. et al, 
2004); (Asano, F. et al, 2004). These biped robot's studies used the technique of the passive 
dynamic walking which used inertia and gravity positively by decreasing the number of 
actuators as much as possible. However, in order to adapt the unknown ground, the 
biped robot needs actuators to improve the environment adaptability and robustness. 
Here, Ono proposed the optimal trajectory planning method based on a function 
approximation method to realize an energy-efficient walking of the biped robot with 
actuators similar to a passive dynamic walking (Imadu, A. & Ono, K. 1998); (Ono, K. & 
Liu, R., 2001); (Ono, K. & Liu, R., 2002); (Peng, C. & Ono K., 2003). Furthermore, Huang 
and Hase verified the optimal trajectory planning method for energy-efficient biped 
walking by experiment, and proposed the inequality state constraint to obtain better 
solution which is desirable posture in the intermediate time of the walking period (Hase, 
T. & Huang, Q., 2005); (Huang, Q. & Hase, T„ 2006); (Hase, T., et al., 2006). 
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In this chapter, we introduce the newest researches on energy-efficient walking of the biped 
robot for level ground form two viewpoints, one is semi-passive dynamic walking with only 
hip actuator using self-excited mechanism, another is active walking with actuators using 
optimal trajectory planning. 

The chapter is organized as follows. In section 2, the self -excited walking of a four-link 
biped robot and the self-excitation control algorithm enables the robot to walk on level 
ground by numerical simulation and experiment will be introduced. In section 3, we aim at 
realizing an energy-efficient walking of the four-link biped robot with actuators similar to a 
passive dynamic walking. An optimal trajectory planning method based on a function 
approximation method applied to a biped walking robot will be shown. And then, we use 
the inequality state constraint in the intermediate time and restrict the range of joint angles. 
In this way, a better solution which is desirable posture in the intermediate time can be 
obtained. Furthermore, in section 4, with "Specific Cost", we show that the biped walking 
with the above two methods have more efficient energy than the other methods which use 
geared motors. Finally, the conclusions will be presented in section 5. 

2. Self-Excited Walking for Biped Mechanism 

In this section, we introduce a study on the self-excited walking of a four-link biped 
mechanism that proposed by Ono (Ono, K. et al, 2001). And then, we show that the self- 
excitation control enables the three-degree-of-freedom planar biped model to walk on level 
ground by numerical simulation. From the parameter study, it was found that stable 
walking locomotion is possible over a wide range of feedback gain and link parameter 
values and that the walking period is almost independent of the feedback gain. Various 
characteristics of the self-excited walking of a biped mechanism were examined in relation 
to leg length and length and mass ratios of the shank. Next, a biped mechanism was 
manufactured similar to the analytical model. After parameter modification the authors 
demonstrated that the biped robot can perform natural dynamic walking on a plane with a 
0.8 degree inclination. The simulated results also agree with the experimental walking 
locomotion. 

2.1 Analytical Model of Biped Walking Robot and Kinetic Process 

2.1.1 Features of Biped Locomotion and Possibility of Its Self-Excitation 

Fig.l shows a biped mechanism to be treated in this study. Here we focus only on the biped 
locomotion in the sagittal plane. The biped mechanism does not have an upper body and 
consists of only two legs that are connected in a series at the hip joint through a motor. Each 
leg has a thigh and a shank connected at a passive knee joint that has a knee stopper. By the 
knee stopper, an angle of the knee rotation is restricted like the human knee. The legs have 
no feet, and the tip of the shank has a small roundness. The objective of this study is to make 
the biped mechanism perform its inherent natural walking locomotion on level ground not 
by gravitational force but by active energy through the hip motor. 

The necessary conditions for the biped mechanism to be able to walk on level ground are as 
follows: (1) The inverted pendulum motion for the support leg must synchronize with the 
swing leg motion. (2) The swing leg should bend so that the tip does not touch the ground. 
(3) The dissipated energy of the mechanism through collisions at the knee and the ground, 
as well as friction at joints, should be supplied by the motor. (4) The knee of the support leg 



Energy-Efficient Walking for Biped Robot Using 

Self-Excited Mechanism and Optimal Trajectory Planning 323 

should not be bent by the internal force of the knee stopper. (5) The synchronized motion 
between the inverted pendulum motion of the support leg and the two-DOF pendulum 
motion of the swing leg, as well as the balance of the input and output energy, should have 
stable characteristics against small deviations from the synchronized motion. 



Walking direction 




Swing leg 
Stopper 



Fig.l. Three-degree-of freedom walking mechanism on a sagittal plane (Ono, K. et al, 2001) 

First we pay attention to the swing leg and try to generate a swing leg motion that can 
satisfy the necessary conditions (2) and (3) by applying the self-excitation control to the 
swing leg motion. Ono and Okada (Ono, K. & Okada, T., 1994) have already investigated 
two kinds of self-excitation control of two-DOF vibration systems and showed that the Van 
der Pol-type self-excitation can evoke natural modes of the original passive system, while 
the asymmetrical stiffness matrix type can excite the anti-resonance mode that has a phase 
shift of about 90 degrees between input and output positions. 

The two-DOF pendulum of a swing leg has the first-order mode with an in-phase at each 
joint and the second-order mode with an out-of-phase at each joint. Thus, it will be 
impossible to generate a walking gait by applying the Van der Pol-type self-excitation. In 
contrast, by means of the negative feedback from the shank joint angle 3 to the input 
torque T at the thing joint, the system's stiffness matrix becomes asymmetrical. Thus, the 
swing motion would change so that the shank motion delays at about 90 degrees from the 
thigh motion. Through this feedback, it is also expected that the kinetic energy of the swing 
leg increases and that the reaction torque (=T) will make the support leg rotate in the 
forward direction in a region where # 3 > 0. The self-excitation of the swing leg based on 
the asymmetrical matrix is explained in detail below. 

2.1.2 Self-Excitation of the Swing Leg 

Fig.2 depicts the two-DOF swing leg model whose first joint is stationary. To make Fig.. 2 
compatible with Fig. 3 (b), the upper and lower links are termed the second and third links, 
respectively. To generate a swing motion like a swing leg, only the second joint is driven by 
the torque T2, which is given by the negative position feedback of the form 
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T 2 =-k0 3 . 



(1) 



Joint2 
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Fig.2. Analytical model of two-degree-of-freedom swing leg (Ono, K. et al, 2001) 

From the fundamental study of the asymmetrical stiffness matrix-type self-excitation (Ono, 
K. & Okada, T., 1994), it is known that damping plays an important role in inducing the self- 
excited motion. Thus, the viscous rotary damper with coefficient y 3 is applied to joint 3. 

Taking Eq. (1) into account, the equation of motion for the two-DOF pendulum system is 
written as 
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We note from Eq. (2) that the stiffness matrix of the system becomes asymmetrical because 
of the feedback gain k. 

To examine the value of k to excite a swing leg motion autonomously and the natural 
frequency and mode at the threshold, Eq. (2) is linearized about 2 = 3 = . The linearized 
equation of motion becomes 
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Putting 02 = ® 2 e At and 3 = ® 3 e xt into Eq. (3), we have 
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Where a-^ = 1 2 + m 2 a 2 + m 3 fl 2 / fl 12 = fl 21 = m 3 fl 3^2 / fl 22 = ^3 +m 3 fl 3 ' ^11 = ( m 2 a 2 + m 3^l) S i 
k 22 = m 2 a 3 g . 

From the condition that the determinant in Eq. (4) is zero, we can obtain the characteristic 
equation of the linearized system of the form 

A X 4 + Aj/l 3 + A 2 X 2 + A 3 X + A 4 = 0. (5) 

As k increases, one of the eigenvalues becomes unstable. At the boundary, X becomes an 
imaginary number. Thus, putting X = coi into Eq. (5), we can get the critical k value and the 
natural frequency at the inst ability threshold. 

Here we assume that both links are a uniform bar whose length is 0.4 m ( = l 2 = Z 3 ), and the 
mass is 2 kg ( = m 2 =m 3 ). For these link parameters, one of the natural modes becomes 
unstable when k > 1.43 or k < -3.14. The natural frequency of the unstable swing motion at 
k= 1.43 is 0.71 Hz, whereas that at k = -3.14 is 0.61 Hz. If the damping does not exist at joint 3, 
one of the natural modes becomes unstable when 30.4 > K > 6.3. Note that the damping 
coefficient y 3 , however small it might be, can significantly reduce the critical k value. 
Strangely enough, the magnitude of the damping coefficient y 3 does not change the critical 
k value and the natural frequency at the threshold. However, the damping coefficient value 
slightly influences the natural mode of the unstable vibration. By putting the natural 
frequency and k value into Eq. (4) , we found a swing motion such that the lower link delays 
from the upper link are excited when k > 1.43. The magnitude and phase of 2 /® 3 are 
0.60 and 8.89 degrees, respectively, when y 3 = 0.15 Nms/rad. As the unstable vibration 
grows in magnitude and approaches a limit cycle, the phase of the 2 /0 3 tends to be 90 
degrees (Ono, K. & Okada, T., 1994). 

2.2 Analytical Model of Biped Locomotion and Basic Equations 

The next question is whether the biped mechanism, which combines the two-DOF swing leg 
discussed in Section 2.1 and the single-DOF support leg, can generate a biped locomotion 
that satisfies the conditions (1), (4), and (5). Since it is difficult to derive this solution 
analytically, we numerically show that the nonlinear biped system excited by the 
asymmetrical feedback exhibits a stable biped locomotion that satisfies the three conditions 
as a limit cycle of the system. 

Fig.3 (a) shows the representative postures of a biped mechanism during half of the period 
of biped locomotion. From an aspect of the difference of the basic equation, one step process 
can be divided into two phases: (1) from the start of the swing leg motion to the collision at 
the knee (the first phase) and (2) from the knee collision to the touchdown of the straight 
swing leg (the second phase). We assume that the change of the support leg to the swing leg 
occurs instantly and that the end of the second phase is the beginning of the first phase. The 
self-excitation feedback of (1) is applied only during the first phase. We assume that the 
support leg is kept straight because of internal reaction torque at the knee for simplifying 
the forward dynamic simulation. The validity of this assumption will be examined later. 
Under the assumption of a straight support leg, the analytical model during the first phase 
is represented as a three-DOF link system, as shown in Fig.3 (b). Viscous damping is applied 
to the knee joint of the swing leg. The equation of motion in this system during the first 
phase is written as 
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Where the elements M,,- , C« , and JQ of the matrices are shown in Appendix A. T2 is 
feedback input torque given by Eq. (1). 
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(b) Three-DOF model 



Second phase (touchdown) 

(a) Two phases of biped walking 
Fig.3. Analytical model of three-degree-of-freedom walking mechanism (Ono, K. et al, 2001) 

When the shank becomes straight in line with the thigh at the end of the first phase, it is 
assumed that the knee collision occurs plastically. From the assumption of conservation of 
momentum and angular momentum before and after the knee collision, angular velocities 

after the knee collision are calculated from the condition 2 = #3" ar, d the equation of the form 

-[MY 1 h(e 2 ,0 2 )-r , (V) 

h[^'03 ) 

Where the elements of the matrix [M] are the same as M;j in Eq. (6) . /1 , / 2 , and / 3 are 

presented in Appendix B. r is the moment impulse at the knee. 

During the second phase, the biped system can be regarded as a two-DOF link system. Thus 

the basic equation becomes 
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We assume that the collision of the swing leg with the ground occurs plastically (the 
translational velocity at the tip of the swing leg becomes zero) and that there is a sufficient 
friction between the tip and the ground surface to prevent slippage. The angular velocities 
of the links after the collision are derived from conservation laws of momentum and angular 
momentum. They are calculated from Eq. (7) by putting r = . 

To investigate the self-excited biped locomotion, a set of basic equations of (6), (7), and (8) 
were numerically solved based on the fourth-order Runge-Kutta method. The standard 
values of the link parameters used in the calculation are shown in Table 1. The values of 
mass and moment of inertia were calculated assuming uniform mass distribution along the 
link. Since the magnitude of the damping coefficient scarcely influences the biped gait, 
/3 = 0.15 Nms/rad was used in the calculation. 





Linkl 


Link2 


Link3 


mi 


mass [kg] 


4.0 


2.0 


2.0 


h 


length [kgm 2 ] 


0.21 


0.027 


0.027 


k 


Moment of inertia [m] 


0.80 


0.40 


0.40 


at 


Offset of mass center [m] 


0.40 


0.20 


0.20 



Table 1. Link Parameter Values Used for Simulation Study (Ono, K. et al, 2001). 



2.3 Simulation Results and Characteristics of Self-Excited Walking 

In this simulation, we assumed that the support leg is kept straight due to internal torque 
during the first and second phases. To check whether the support leg can be lept straight by 
the passive knee stopper, the internal torque at the knee of the two-DOF support leg model 
was inversely calculated by using the calculated stable biped locomotion. Fig.4 plots the 
angular displacement of 0^ - n , 2 , and 3 and the internal torques at the knees of the 

support and swing legs, which are calculated from 9\ , 2 , ar| d 03 based on the inverse 

dynamic analysis. The torque curve at </> (r = 0\ - 2 ^ . If the knee torque is negative, the 

reaction torque exists at the knee stpooer so that the support leg can be regarded as a single- 
DOF link. We observe that at the beginning of the first phase, the knee torque of the support 
leg becomes positive for a short time of less than 0.1 s. 

Then, we calculated the knee torque of the support leg by assuming an offset angle of 
(/> a = 2 -0 3 =2 degrees at the knee stopper. The calculated result is also shown in 

Fig.4. We note that the knee torque of the support leg shifts downward and changes 
to a negative value during the entire period. Moreover, it is seen from the negative 
knee torque of the swing leg that the swing leg can be kept straight during the 
secondphase. 

In the numerical sumulation, steady biped locomotion can be obtained as a limit cycle for 
the feedback gain value in a range of 4.8 Nm/rad < k < 7.5 Nm/rad. When k is smaller than 
4.8 Nm/rad, the initial step length gradually decreases and the biped mechanism falls down 
forward. When k is larger than 7.5 Nm/rad, the step changes irregularly and the biped 
mechanism falls down backwards because the swing leg is raised too high. Fig.5 shows the 
walking period, average velocity, and efficiency as a function of feedback gain k. The 
efficiency is defined as the ratio of average walking velocity to average input power, where 
the average input power is given by 
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I \0 2 T 2 \dt 

r 4 f t=t 



(9) 
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Fig.4. Motion of legs and internal torque at knee during one step (Ono, K. et al, 2001). 
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Fig. 5. Effect of feedback gain on walking period, velocity, and efficiency (Ono, K. et al, 
2001). 

We note from this figure that the walking period shows an almost constant value of 1.3 s 
at any k value. This is because the self-excited biped locomotion is a natural motion 
inherent in the biped system. On the other hand, the average velocity increases gradually 
because of the increase in step length. However, the efficiency decreases with an increase 
in the feedback gain k because the average input power increases more rapidly with an 
increase in k. It is worthy to note that the average input power to generate biped 
locomotion with a velocity of 0.28 m/s is only about 4 W (=0.28/0.07) at k = 6.0 Nm/rad. 
If the additional movement of 0.6 m by feet during one period of 1.3 s is taken into 
account, the natural walking velocity becomes 2.7 km/h. the natural walking period of an 
adult with a 0.9 m leg height is 1.3 s (Yang 1994). Thus, it can be said that the calculated 
results of the walking period and velocity are comparable with the natural walking of an 
adult with a 0.8 leg height. 
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Validity of the self-excited walking stated above was confirmed experimentally (Ono, K., 
2001). Extensions of the self -excited walking to a biped with feet and bent knee walking 
mode are described in (Ono, K., 2004) and (Kaneko, Y. & Ono K., 2006), respectively. 

3. Energy-Efficient Trajectory Planning for Biped Walking Robot 

In this section, we aim at realizing a energy-efficient walking of biped with actuators 
similar to a passive dynamic walking. We suggested optimal trajectory planning method 
based on a function approximation method, and applied for a biped walking robot 
analytically. We obtained the solutions which made the square integral value of input 
torque minimum. Until this point, we clarified the application of optimal trajectory 
planning method for biped walking robots, but they did not constrain the clearance 
between toe and ground at intermediate time of one period. As is generally known, it is 
necessary to give the clearance between toe and ground in the intermediate time of one 
step period. However, because there is the redundant degree of freedom by only 
constraining the clearance with equality state constraint, the solutions of undesired 
postures may be obtained. Therefore we suggest the optimal trajectory planning method 
with inequality state constraint that can restrict the joint angles to the desired range in a 
certain time. Concretely, we use inequality state constraint in the intermediate time and 
restrict the range of joint angles and may obtain a better solution which is desirable 
posture in the intermediate time. 

With this method, we calculate optimal walking gait for biped with upper body mass. This 
model is attached the same masses to the support leg and swing leg in forward direction. 
And we examine the influence of installation position of the upper body mass. 
And We perform a walking experiment by a prototype robot to inspect the effectiveness of 
this method. At first, in order to realize a stable walking, we attach a sensor at a sole to 
measure the time of grounding and compensate the error between reference trajectory and 
experimental trajectory. And, we install adjustment masses to revise parameters of the 
prototype robot. Furthermore, we use "Specific Cost" which shows energy consumption per 
unit mass and per unit movement velocity for evaluation of energy efficiency in section 4. 
By comparing the data of the other robots with this method, we show that our robot walks 
more efficiently than the other robots which use geared motors. 

3.1 Analytical Model of Biped Walking Robot and Kinetic Process 

3.1.1 Analytical Model and Kinetic Process 

Fig.6 shows the three link biped model used in this research. Link 1 is the support, link 2 is 

the thigh, and link 3 is the shank. This model has upper body masses which is the same 

masses attached to the support leg and swing leg in forward direction. Table. 3 shows the 

link parameters used in this analysis. 

Table.2 shows symbols used in this paper (Hase, T.; Huang, Q. & Ono, K., 2006). 

One step period is from Posture 1 to Posture 4 shown in Fig. 7. State variables of Posture 4 

are same as one of Posture 1. When it changes from Posture 3 to Posture 4, a collision 

between toe and ground happens, and link parameters exchange. In Posture 3, the swing leg 

collides with the ground, and becomes the support leg and the support leg leaves from the 

ground and comes back to Posture 4 (Posture 1). Leg exchange is performed by perfectly 

inelastic collision momentarily. 
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Symbols 


Description 


mi 


i-th link mass 


h 


i -th link moment of inertia 


k 


i -th link length 


«j 


Length between i-th joint and /-th link COG 


h 


Length between (i+l)-th joint and i-th link COG 


mi 


Upper body mass 


d 


Length between upper body mass and hip joint 


8 




:-th link absolute angle 


a 




Offset angle between i-th link and i-th link COG from joint i 


P 




Offset angle between i-th link and i-th link COG from joint (i+1) 


X 




x-coordinate of i-th link COG 


y 




y-coordinate of i-th link COG 


g 


Acceleration of gravity 


Hi 


i-th joint input torque 


] 


Performance function 


£ 




Dummy variable used for inequality constraint 


5 




Clearance between toe and ground 


V 




i-th coefficient vector of basis function 


h 




i-th basis function 


T 


Period of one step 


S 


Length of ste 



Table 2. Symbols used in this section (Hase T. & Huang, Q., 2005) 

y 

x Walking direction 




Fig.6. 3 link model with upper body mass (Hase T. & Huang, Q., 2005). 
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Linkl 


Link2 


Link3 


m; 


[kg] 


4.2 


1.2 


1.2 


h 


[kgm2] 


0.127 


0.029 


0.029 


k 


[m] 


0.60 


0.30 


0.30 


at 


[m] 


0.339 


0.115 


0.125 



Table 3. Link parameters (Hase T. & Huang, Q., 2005). 





Posture 1 Posture 2 Posture 3 Posture 4 

Fig. 7. Process of one step walking (Hase T. & Huang, Q., 2005). 




Fig.8. Impulse for i-th link (Hase T. & Huang, Q., 2005). 



3.1.2 Dynamic Equations and Analysis of Collision between Toe and Ground 

By the analytical model shown in Fig.6, dynamic equations are lead as follows. We assumed 
that there is no friction in each joint. 



M(0)0 + C(0)0 2 +K(0) = u 



(10) 



When collision between toe and ground occurs, it is assumed that a momentum and angular 

momentum of each link are conserved. Leg exchange happens shown as Fig. 7. 

Fig.8 shows impulse for i-th link. "vi x ", "viy" is X, y ingredient of translation velocity of a 

center of gravity of each link. "+" indicates the state just after the swing leg collides with the 

ground, and "-" indicates the state just before that. 

Using the conservation law of momentum and angular momentum, following equations 

work out. 



m i(vf x - v ix ) = P ix - P (i+1)x 



mi(v iy -v iy ) = P iy - 



' (i+l)y 



(11) 



lj(0f-0 t ) = a i xP i +(l i -a i )xP i 



i+i 
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When Eq.(lO) are organized, angular velocity just before and after collision are as follows. 

M(0)0 + =H(0)0~ (12) 

Matrices in Eq.(lO), (12) are written in (Ono, K. & Liu, R., 2002). 

3.2 Optimal Trajectory Planning Method with Inequality Constraint 

3.2.1 Conventional Optimal Trajectory Planning Method (Ono, K. & Liu, R., 2002) 
We apply the function approximation method to express the joint angle trajectory by 
Hermite polynomial function set. Then the optimal trajectory planning problem is converted 
into the parameter optimization problem of the coefficients of the basis function. The joint 
angle trajectory is expressed as follows: 

0(p,t) = h(t) T p (13) 

where h(t) is the Hermite basis function vector and p is the coefficient vector of the basis 

function. Here, the coefficient vector is the boundary state variable vector. 

The total constraint conditions and the performance index which are function of p are 

expressed as follows: 

C(p) = (14) 

J = J(P) (15) 

then, we obtain the solution which make Eq.(15) minimum while Eq.(14) is satisfied. 

The approximated solution p^ +1 in the (fc+l)-th step of Eq.(14) is solved by Newton- 

Raphson iteration method in the following form: 

8C (Pk) A rv » 

- A P = -C(Pk) .... 

dp (16) 

(&P = Pk+i-Pk) 

In Eq.(16), the unknown variable number is far more than the equation number. Therefore, 
the number of the solutions which satisfy Eq.(14) is infinite. A particular solution is solved 
using the singular value decomposition (SVD). 
Then, the general solutions Ap become as follows: 

Ap = Ap -Bq (17) 

Po is a particular solution. B is solved by using the SVD method, and q is the null-space 
vector to be determined from the / minimum condition as follows: 

^- = ^-5 = (18) 

dq dp 

In order to solve q to satisfy Eq.(18), we use Newton-Raphson iteration method. The second 
order partial differentiation of / with respect to q is calculated as follows: 

^L = B T ^L B (19) 

dq 2 dp 2 

Then, we obtain p from q and obtain the optimal solution. 
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3.2.2 Introducing the Inequality Constraint 

Conventionally, we used only equality constraint at the beginning, middle and ending time 
for a biped walking robot. But there is such a case that the joint angle should pass the 
desirable ranges without being restricted precisely. In that case, the better solution will be 
obtained when the joint angle is restricted with inequality constraint than equality constraint. 
When we use inequality constraint, we convert Eq.(14) into follow equations with dummy 
variables. 

I Set initial values p n = I 

1 

[Transi'iimi C(p)^H ink) C (]))=<£ ^ 

i- — — 



Calculate trajectory represented by p 



T 



[ Calculate C(p,. ) and its derivatives by p] 

i 

[Calculate Ap] 
[Calculate CfaJ and its derivatives by p] 

~T~ { 

[Calculate J(p kfl ) and its derivatives by pj 

[Calculate B matrix by SVP] 

[Calculate q by Newton Method) 




(20) 



Fig.9. Flowchart of this algorism (Hase T. & Huang, Q., 2005). 

C(p)>0 => C(p) = g 2 

C(p)<0 => C(p) = -g 2 
Then, we use the dummy variables as optimization parameters. 
For example, if you restrict the state variable of joint angle at boundary time f, as follows: 

0(t ( )>O (21) 

Then, the dummy variable introduced as follows: 

6(1,) = g 2 (22) 

The state variables at boundary time t ,- are expressed as j'-th optimization parameter p .• as 

follows: 

0(t t ) = Pj (23) 
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Here, Eq.(22) and Eq.(23) are organized as follows: 

P,=? 2 (24) 

Then, we convert p, into g in a coefficient parameter of the performance index, basis 

function, and other constraints. Optimization calculation is performed with respect to g and 
other parameters (not p. ). Fig.9 shows the flowchart of this algorithm. 

3.3 Application for Biped Walking Robot 

3.3.1 Performance Function 

In order to obtain a natural walking solution with the lowest input torque, the performance 
index in this paper is defined as the sum of the integration of square input torque as follows: 

J=ik,£u?(t)dt (25) 



Zki$uf(t)dt 



where fc, is the weighting factor corresponding to joint i, and u, is the input torque at joint i. When 
joint i is passive joint, we relatively make k t larger and bring the passive joint torque close to 0. 

3.3.2 The Results of Using Equality Constraint at Beginning and Ending Time 

At first, the period T of one step and the length of the step S are determined. The beginning 
and ending posture is that both legs become straight. This posture is determined when the 
length of the step is determined. 

Fig.10 and Fig.ll show stick figure of trajectory, joint input torque, angular velocity, and 
clearance between toe and ground at T=0.60[s], S=0.15[m]. 
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Fig.10. The result of using equality constraint at beginning and ending time (S=0.150[m] 
T=0.60[s]) (Hase T. & Huang, Q., 2005). 
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Fig.ll. The result of using equality constraint at beginning and ending time ( u^ =0, 
S=0.150[m], T=0.60[s]) (Hase T. & Huang, Q., 2005). 

3.3.3 The Results of Using Inequality Constraint at Intermediate Time 

From the result of preceding section, there is the moment when toe comes in contact with 
ground. There is not utility in this result. 

Then, we give the clearance (s) between toe and ground by constraining the posture at 
intermediate time (0.5T). The constraint is expressed as follows: 

d = -l l cose 1 {03T)-l 2 cos6 2 {Q.5T)-l ?l cose ?> {0.5T) (26) 

This constraint has three variables and becomes redundant. It is desirable that the joint 
angles come to the following range so that it is natural walking. 

6> 2 (0.5T)<0 

<%(0.5T)>0 

These inequality constraints are converted into following equality constraints. 

2 (O.5T) = -Cl 

3 {O.5T) = d 

Figs. 12 and 13 show the results of £ =20[mm]. These results show that clearance condition is 
satisfied. 



(27) 



(28) 



3.3.4 A Change of the Evaluation Function for a Change of a Period 

Figs. 14 and 15 show the changes of performance function J for change of period T. 
Conventionally, we constrained the swing leg posture at intermediate time with equality 
constraint. These graphs compare this method (inequality) with a conventional method 
(equality), and the performance function J by this method is smaller than that of a 
conventional method. They have local minimum solutions between 0.6s from 0.5s. 
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Fig.12. The result of using inequality constraint at intermediate time (g =20[mm], S=0.150[m] 
T=0.55[s]) (Hase T. & Huang, Q., 2005). 
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Fig.13. The result of using inequality constraint at intermediate time ( Uj =0, S =20[mm], 
S=0.150[m], T=0.55[s]) (Hase T. & Huang, Q., 2005). 
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Fig.14. Change of performance function J for change of period T (<y=20[mm], S=0.15[m]) 

(Hase T. & Huang, Q., 2005). 
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Fig.15. Change of performance function J for change of period T ( u^ =0, S =20[mm], 
S=0.15[m]) (Hase T. & Huang, Q., 2005). 

3.3.5 The Influence of Installation position of the upper body mass 

Fig.16 shows influence of installation position of the upper body mass. When ui is 0, the 
input becomes larger as the installation position becomes longer. When uo is 0, there is an 
optimal installation position at 80mm. 



3.4. Walking Experiment for Energy-Efficient Trajectory Planning 

3.4.1 Hardware 

Fig.17 shows the summary of the prototype robot. This robot has three legs to constraint the 
motion of roll direction. Hip joint and knee joint have actuators. These actuators are DC 
motors with gear reducer. They are controlled by motor drivers. Table.4 shows the 
parameters of the prototype robot. 
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Fig.16. The influence of installation position of the upper body mass (<5=20[mm], T=0.55[s]) 

(Hase T. & Huang, Q., 2005). 




Fig.17. Prototype robot (Hase T. & Huang, Q., 2005). 





Inner 


Outer 


Thigh 


Shank 


Thigh 


Shank 


m[kg] 


3.02 


1.22 


2.92 


1.20 


I [kgm2] 


0.027 


0.015 


0.031 


0.017 


l[m] 


0.30 


0.30 


0.30 


0.30 


a[m] 


0.10 


0.12 


0.13 


0.13 



Table 4. Link parameters (Hase T. & Huang, Q., 2005). 
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3.4.2 Software 

We performed an optimization calculation in offline beforehand and save the data of 
relative joint angle trajectory. It is the best that robot controlled with feed forward control of 
optimal torque. But it is difficult to measure the parameters of experimental robot and 
collision between toe and ground precisely. Therefore, we used PD feedback control for each 
joint and let the experimental trajectory follow the optimal trajectory. 

In the experiment, the timing of a landing does not agree with the analysis and the delay 
occurs for optimal trajectory. Then, we attached a touch sensor at a sole and the input of the 
next period starts when a sensor becomes on. The robot keeps the ending posture until the 
toe lands. 

The analysis model has only three links and assumed the support leg to be one link. We 
attached stoppers to knee joints and fixed it by pushing it to stoppers with motors when the 
leg becomes support leg. 

3.4.3 The Results of Walking Experiment 

Figs. 18 and 19 show the results of eight steps of walking experiment. Fig.18 is relative joint 
angle. It has delay, but almost follows the optimal trajectory. Because we used PD feed back 
control in order to cancel model simplifications (no backlash, friction at joints, perfect 
inelastic collision between toe and ground). Fig.19 shows the PD feedback input torque. 
When the legs collide with the ground, they vibrate and PD feedback input becomes larger. 
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Fig.18. Joint angle trajectory (Hase T. & Huang, Q., 2005). 

4. Energy Efficiency 

Energy efficiency of walking can be evaluated with Specific Cost "C" (the energy consumed 

per unit mass and per transport velocity) as follows: 

P 
C = -!— (29) 

mgv 

Here, P is the energy consumed per unit of time [W], m is mass [kg] , v is transport 

velocity[m/s]. 

In the case of the self-excited mechanism, there is one type of Specific Cost Cmt that 

calculated by the simulation results in section 2.3. In this case, C m t is obtained with the 
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mechanical work energy (4W) and walking velocity (0.28m/ s) for the 8kg weight of the 
robot model. Then, C m t become about 0.18. 




12 3 4 

Time [s] 

Fig.19. Joint input torque (Hase T. & Huang, Q., 2005). 



And then, in the case of the optimal trajectory planning, there are two types of Specific Cost 
(Cet, C mt ). Cet is obtained with the energy that consumed at the motors (11. 7W), 
microcomputer, and sensors (5W). Cet become 0.68. C mt is obtained with the mechanical 
work energy (8.8W). Then, C m t become 0.36. Table.5 shows C e t and C m t of some types of 
biped walking robots (Collins, S. H. & Ruina A., 2005). 





C et 


C m t 


Actuator 


Honda's Asimo 

Our Robot (Self-Excited Machanism) 

Our Robot (Optimal trajiectry planning) 


3.2 
0.68 


1.6 
0.18 
0.36 


Geared motor 


T.U.Delft's Denise 
MIT's Spring Flamingo Cornell's Biped 


5.3 
2.8 
0.2 


0.08 
0.07 
0.055 


Pneumatic McKibben muscle 

Series elastic actuator 

Push-off actuator 


McGeer's Dynamite 


- 


0.04 





Table 5. Specific cost of transport and mechanical energy efficiency of biped walking robots. 

As our robots uses gears of large reduction ratio, the energy consumption by friction grows 
large, and Specific Cost grows larger than the other robots which used the mechanism of 
passive dynamic walking. However, it is very smaller than the other type robot using the 
gears of large reduction ratio. 



5. Conclusion 

In this chapter, we introduced the researches on energy-efficient walking of the biped robot 
for level ground form two viewpoints, one is semi-passive dynamic walking with only hip 
actuator using self-excited mechanism, another is active walking with actuators using 
optimal trajectory planning. We can make the conclusions as follow. 
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In the case of energy-efficient walking using the self -excited mechanism: 

1. To develop an efficient control method of natural biped locomotion, the self- 
excitation control of asymmetry stiffness type was applied to the biped mechanism 
to generate natural walking. 

2. In a two-DOF swing leg whose thigh and shank are connected by a passive knee 
joint, the input torque at the hip joint can generate a walking gait of the swing leg 
and can increase the swing amplitude and its kinetic energy. 

3. From the simulation for a biped mechanism whose leg is as tall as 0.8 m 
(similar to a standard adult), the walking period is 1.3s and walking velocity 
0.28m/ s at an average input power of 4W. According to the Specific Cost data 
comparing with the other types of biped walking robots, we showed that the 
energy-efficient walking is possible using the self-excited mechanism for biped 
locomotion. 

In the case of energy-efficient walking using the optimal trajectory planning: 

1. We proposed optimal trajectory planning method based on a function approximation 
method, and applied for a biped walking robot analytically. And in order to avoid the 
solutions of undesired postures when constrain the clearance between toe and ground at 
intermediate time of one period, we proposed an optimal trajectory planning method with 
inequality constraint, and applied it to a biped walking robot with upper body mass. 

2. We performed a walking experiment by a prototype robot to inspect the effectiveness of 
this method. In the walking experiment, we realized the stable walking. One step period 
was 0.6s, and walking velocity was 0.3m/s. 

3. We showed the energy-efficient walking using the proposed optimal trajectory 
planning method for biped locomotion by comparing Specific Cost of other robots with 
geared motors. 
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1. Introduction 

Why are people attracted to humanoid robots and androids? The answer is simple: because 
human beings are attuned to understand or interpret human expressions and behaviors, 
especially those that exist in their surroundings. As they grow, infants, who are supposedly 
born with the ability to discriminate various types of stimuli, gradually adapt and fine-tune 
their interpretations of detailed social clues from other voices, languages, facial expressions, 
or behaviors (Pascalis et al., 2002). Perhaps due to this functionality of nature and nurture, 
people have a strong tendency to anthropomorphize nearly everything they encounter. This 
is also true for computers or robots. In other words, when we see PCs or robots, some 
automatic process starts running inside us that tries to interpret them as human. The media 
equation theory (Reeves & Nass, 1996) first explicitly articulated this tendency within us. 
Since then, researchers have been pursuing the key element to make people feel more 
comfortable with computers or creating an easier and more intuitive interface to various 
information devices. This pursuit has also begun spreading in the field of robotics. Recently, 
researcher's interests in robotics are shifting from traditional studies on navigation and 
manipulation to human-robot interaction. A number of researches have investigated how 
people respond to robot behaviors and how robots should behave so that people can easily 
understand them (Fong et al., 2003; Breazeal, 2004; Kanda et al., 2004). Many insights from 
developmental or cognitive psychologies have been implemented and examined to see how 
they affect the human response or whether they help robots produce smooth and natural 
communication with humans. 

However, human-robot interaction studies have been neglecting one issue: the "appearance 
versus behavior problem." We empirically know that appearance, one of the most significant 
elements in communication, is a crucial factor in the evaluation of interaction (See Figure 1). 
The interactive robots developed so far had very mechanical outcomes that do appear as 
"robots." Researchers tried to make such interactive robots "humanoid" by equipping them 
with heads, eyes, or hands so that their appearance more closely resembled human beings 
and to enable them to make such analogous human movements or gestures as staring, 
pointing, and so on. Functionality was considered the primary concern in improving 
communication with humans. In this manner, many studies have compared robots with 
different behaviors. Thus far, scant attention has been paid to robot appearances. Although 
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there are many empirical discussions on such very simple static robots as dolls, the design of 
a robot's appearance, particularly to increase its human likeness, has always been the role of 
industrial designers; it has seldom been a field of study. This is a serious problem for 
developing and evaluating interactive robots. Recent neuroimaging studies show that 
certain brain activation does not occur when the observed actions are performed by non- 
human agents (Perani et al., 2001; Han et al., 2005). Appearance and behavior are tightly 
coupled, and concern is high that evaluation results might be affected by appearance. 




Fig. 1. Three categories of humanlike robots: humanoid robot Robovie II (left: developed by 
ATR Intelligent Robotics and Communication Laboratories), android Repliee Q2 (middle: 
developed by Osaka University and Kokoro corporation), geminoid HI-1 and its human 
source (right: developed by ATR Intelligent Robotics and Communication Laboratories). 

In this chapter, we introduce android science, a cross-interdisciplinary research framework 
that combines two approaches, one in robotics for constructing very humanlike robots and 
androids, and another in cognitive science that uses androids to explore human nature. Here 
androids serve as a platform to directly exchange insights from the two domains. To 
proceed with this new framework, several androids have been developed so far, and many 
researches have been done. At that time, however, we encountered serious issues that 
sparked the development of a new category of robot called geminoid. Its concept and the 
development of the first prototype are described. Preliminary findings to date and future 
directions with geminoids are also discussed. 

2. Android Science 

Current robotics research uses various findings from the field of cognitive science, especially 
in the human-robot interaction area, trying to adopt findings from human-human 
interactions with robots to make robots that people can easily communicate with. At the 
same time, cognitive science researchers have also begun to utilize robots. As research fields 
extend to more complex, higher-level human functions such as seeking the neural basis of 
social skills (Blakemore, 2004), expectations will rise for robots to function as easily 
controlled apparatuses with communicative ability. However, the contribution from 
robotics to cognitive science has not been adequate because the appearance and behavior of 
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current robots cannot be separately handled. Since traditional robots look quite mechanical 
and very different from human beings, the effect of their appearance may be too strong to 
ignore. As a result, researchers cannot clarify whether a specific finding reflects the robot's 
appearance, its movement, or a combination of both. 

We expect to solve this problem using an android whose appearance and behavior closely 
resembles humans. The same thing is also an issue in robotics research, since it is difficult to 
clearly distinguish whether the cues pertain solely to robot behaviors. An objective, 
quantitative means of measuring the effect of appearance is required. 

Androids are robots whose behavior and appearance are highly anthropomorphized. 
Developing androids requires contributions from both robotics and cognitive science. 
To realize a more humanlike android, knowledge from human sciences is also 
necessary. At the same time, cognitive science researchers can exploit androids for 
verifying hypotheses in understanding human nature. This new, bi-directional, cross- 
interdisciplinary research framework is called android science (Ishiguro, 2005). Under 
this framework, androids enable us to directly share knowledge between the 
development of androids in engineering and the understanding of humans in cognitive 
science (Figure 2). 
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Fig. 2. Bi-directional feedback in Android Science. 

The major robotics issue in constructing androids is the development of humanlike 
appearance, movements, and perception functions. On the other hand, one issue in 
cognitive science is "conscious and unconscious recognition." The goal of android science is 
to realize a humanlike robot and to find the essential factors for representing human 
likeness. How can we define human likeness? Further, how do we perceive human likeness? 
It is common knowledge that humans have conscious and unconscious recognition. When 
we observe objects, various modules are activated in our brain. Each of them matches the 
input sensory data with human models, and then they affect reactions. A typical example is 
that even if we recognize a robot as an android, we react to it as a human. This issue is 
fundamental both for engineering and scientific approaches. It will be an evaluation 
criterion in android development and will provide cues for understanding the human 
brain's mechanism of recognition. 

So far, several androids have been developed. Repliee Q2, the latest android (Ishiguro, 
2005), is shown in the middle of Figure 1. Forty-two pneumatic actuators are embedded in 
the android's upper torso, allowing it to move smoothly and quietly. Tactile sensors, which 
are also embedded under its skin, are connected to sensors in its environment, such as 
omnidirectional cameras, microphone arrays, and floor sensors. Using these sensory inputs, 
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the autonomous program installed in the android can make smooth, natural interactions 
with people near it. 

Even though these androids enabled us to conduct a variety of cognitive experiments, they 
are still quite limited. The bottleneck in interaction with human is its lack of ability to 
perform long-term conversation. Unfortunately, since current AI technology for developing 
humanlike brains is limited, we cannot expect humanlike conversation with robots. When 
meeting humanoid robots, people usually expect humanlike conversation with them. 
However, the technology greatly lags behind this expectation. AI progress takes time, and 
such AI that can make humanlike conversation is our final goal in robotics. To arrive at this 
final goal, we need to use currently available technologies and understand deeply what a 
human is. Our solution for this problem is to integrate android and teleoperation 
technologies. 

3. Geminoid 




Fig. 3. Geminoid HI-1 (right). 

We have developed Geminoid, a new category of robot, to overcome the bottleneck issue. We 
coined "geminoid" from the Latin "geminus," meaning "twin" or "double," and added 
"oides," which indicates "similarity" or being a twin. As the name suggests, a geminoid is a 
robot that will work as a duplicate of an existing person. It appears and behaves as a person 
and is connected to the person by a computer network. Geminoids extend the applicable 
field of android science. Androids are designed for studying human nature in general. With 
geminoids, we can study such personal aspects as presence or personality traits, tracing 
their origins and implemention into robots. Figure 3 shows the robotic part of HI-1, the first 
geminoid prototype. Geminoids have the following capabilities: 



Appearance and behavior highly similar to an existing person 

The appearance of a geminoid is based on an existing person and does not depend on the 
imagination of designers. Its movements can be made or evaluated simply by referring to 
the original person. The existence of a real person analogous to the robot enables easy 
comparison studies. Moreover, if a researcher is used as the original, we can expect that 
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individual to offer meaningful insights into the experiments, which are especially important 
at the very first stage of a new field of study when beginning from established research 
methodologies. 

Teleoperation (remote control) 

Since geminoids are equipped with teleoperation functionality, they are not only driven by 
an autonomous program. By introducing manual control, the limitations in current AI 
technologies can be avoided, enabling long-term, intelligent conversational human-robot 
interaction experiments. This feature also enables various studies on human characteristics 
by separating "body" and "mind." In geminoids, the operator (mind) can be easily 
exchanged, while the robot (body) remains the same. Also, the strength of connection, or 
what kind of information is transmitted between the body and mind, can be easily 
reconfigured. This is especially important when taking a top-down approach that 
adds/ deletes elements from a person to discover the "critical" elements that comprise 
human characteristics. Before geminoids, this was impossible. 



3.1 System overview 

The current geminoid prototype, HI-1, consists of roughly three elements: a robot, a central 
controlling server (geminoid server), and a teleoperation interface (Figure 4). 




Teleoperation 

interface The Internet 

Fig. 4. Overview of geminoid system. 
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A robot that resembles a living person 

The robotic element has essentially identical structure as previous androids (Ishiguro, 2005). 
However, efforts concentrated on making a robot that appears — not just to resemble a living 
person — to be a copy of the original person. Silicone skin was molded by a cast taken from 
the original person; shape adjustments and skin textures were painted manually based on 
MRI scans and photographs. Fifty pneumatic actuators drive the robot to generate smooth 
and quiet movements, which are important attributes when interacting with humans. The 
allocations of actuators were decided so that the resulting robot can effectively show the 
necessary movements for human interaction and simultaneously express the original 
person's personality traits. Among the 50 actuators, 13 are embedded in the face, 15 in the 
torso, and the remaining 22 move the arms and legs. The softness of the silicone skin and the 
compliant nature of the pneumatic actuators also provide safety while interacting with 
humans. Since this prototype was aimed for interaction experiments, it lacks the capability 
to walk around; it always remains seated. Figure 1 shows the resulting robot (right) 
alongside the original person, Dr. Ishiguro (author). 
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Teleoperation interface 

Figure 5 shows the teleoperation interface prototype. Two monitors show the controlled 
robot and its surroundings, and microphones and a headphone are used to capture and 
transmit utterances. The captured sounds are encoded and transmitted to the geminoid 
server by IP links from the interface to the robot and vice versa. The operator's lip corner 
positions are measured by an infrared motion capturing system in real time, converted to 
motion commands, and sent to the geminoid server by the network. This enables the 
operator to implicitly generate suitable lip movement on the robot while speaking. However, 
compared to the large number of human facial muscles used for speech, the current robot 
only has a limited number of actuators on its face. Also, response speed is much slower, 
partially due to the nature of the pneumatic actuators. Thus, simple transmission and 
playback of the operator's lip movement would not result in sufficient, natural robot motion. 
To overcome this issue, measured lip movements are currently transformed into control 
commands using heuristics obtained through observation of the original person's actual lip 
movement. 




Fig. 5. Teleoperation interface. 

The operator can also explicitly send commands for controlling robot behaviors using a 
simple GUI interface. Several selected movements, such as nodding, opposing, or staring in 
a certain direction can be specified by a single mouse click. This relatively simple interface 
was prepared because the robot has 50 degrees of freedom, which makes it one of the 
world's most complex robots, and is basically impossible to manipulate manually in real 
time. A simple, intuitive interface is necessary so that the operator can concentrate on 
interaction and not on robot manipulation. Despite its simplicity, by cooperating with the 
geminoid server, this interface enables the operator to generate natural humanlike motions 
in the robot. 



Geminoid server 

The geminoid server receives robot control commands and sound data from the remote 
controlling interface, adjusts and merges inputs, and sends and receives primitive 
controlling commands between the robot hardware. Figure 6 shows the data flow in the 
geminoid system. The geminoid server also maintains the state of human-robot interaction 
and generates autonomous or unconscious movements for the robot. As described above, as 
the robot's features become more humanlike, its behavior should also become suitably 
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sophisticated to retain a "natural" look (Minato et al., 2006). One thing that can be seen in 
every human being, and that most robots lack, are the slight body movements caused by an 
autonomous system, such as breathing or blinking. To increase the robot's naturalness, the 
geminoid server emulates the human autonomous system and automatically generates these 
micro-movements, depending on the state of interaction each time. When the robot is 
"speaking," it shows different micro-movements than when "listening" to others. Such 
automatic robot motions, generated without operator's explicit orders, are merged and 
adjusted with conscious operation commands from the teleoperation interface (Figure 6). 
Alongside, the geminoid server gives the transmitted sounds specific delay, taking into 
account the transmission delay/jitter and the start-up delay of the pneumatic actuators. This 
adjustment serves synchronizing lip movements and speech, thus enhancing the naturalness 
of geminoid movement. 
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Fig. 6. Data flow in the geminoid system. 



3.2 Experiences with the geminoid prototype 

The first geminoid prototype, HI-1, was completed and press-released in July 2006. Since 
then, numerous operations have been held, including interactions with lab members and 
experiment subjects. Also, geminoid was demonstrated to a number of visitors and 
reporters. During these operations, we encountered several interesting phenomena. Here 
are some discourses made by the geminoid operator: 

• When I (Dr. Ishiguro, the origin of the geminoid prototype) first saw HI-1 sitting 
still, it was like looking in a mirror. However, when it began moving, it looked like 
somebody else, and I couldn't recognize it as myself. This was strange, since we 
copied my movements to HI-1, and others who know me well say the robot 
accurately shows my characteristics. This means that we are not objectively 
recognizing our unconscious movements ourselves. 

• While operating HI-1 with the operation interface, I find myself unconsciously 
adapting my movements to the geminoid movements. The current geminoid 
cannot move as freely as I can. I felt that, not just the geminoid but my own body is 
restricted to the movements that HI-1 can make. 
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• In less than 5 minutes both the visitors and I can quickly adapt to conversation 
through the geminoid. The visitors recognize and accept the geminoid as me while 
talking to each other. 

• When a visitor pokes HI-1, especially around its face, I get a strong feeling of 
being poked myself. This is strange, as the system currently provides no tactile 
feedback. Just by watching the monitors and interacting with visitors, I get this 
feeling. 

We also asked the visitors how they felt when interacting through the geminoid. Most said 
that when they saw HI-1 for the very first time, they thought that somebody (or Dr. Ishiguro, 
if familiar with him) was waiting there. After taking a closer look, they soon realized that 
HI-1 was a robot and began to have some weird and nervous feelings. But shortly after 
having a conversation through the geminoid, they found themselves concentrating on the 
interaction, and soon the strange feelings vanished. Most of the visitors were non- 
researchers unfamiliar with robots of any kind. 

Does this mean that the geminoid has overcome the "uncanny valley"? Before talking 
through the geminoid, the initial response of the visitors seemingly resembles the reactions 
seen with previous androids: even though at the very first moment they could not recognize 
the androids as artificial, they nevertheless soon become nervous while being with the 
androids. Are intelligence or long-term interaction crucial factors in overcoming the valley 
and arriving at an area of natural humanness? 

We certainly need objective means to measure how people feel about geminoids and other 
types of robots. In a previous android study, Minato et al. found that gaze fixation revealed 
criteria about the naturalness of robots (Minato et al., 2006). Recent studies have shown 
different human responses and reactions to natural or artificial stimuli of the same nature. 
Perani et al. showed that different brain regions are activated while watching human or 
computer graphic arms movements (Perani et al., 2001). Kilner et al. showed that body 
movement entrainment occurs when watching human motions, but not with robot motions 
(Kilner et al., 2003). By examining these findings with geminoids, we may be able to find 
some concrete measurements of human likeliness and approach the "appearance versus 
behavior" issue. 

Perhaps HI-1 was recognized as a sort of communication device, similar to a telephone or a 
TV-phone. Recent studies have suggested a distinction in the brain process that 
discriminates between people appearing in videos and existing persons appearing live 
(Kuhl et al., 2003). While attending TV conferences or talking by cellular phones, however, 
we often experience the feeling that something is missing from a face-to-face meeting. What 
is missing here? Is there an objective means to measure and capture this element? Can we 
ever implement this on robots? 

4. Summary and further issues 

In developing the geminoid, our purpose is to study Sonzai-Kan , or human presence, by 

extending the framework of android science. The scientific aspect must answer questions 

about how humans recognize human existence/ presence. The technological aspect must 

realize a teleoperated android that works on behalf of the person remotely accessing it. 

This will be one of the practical networked robots realized by integrating robots with the 

Internet. 

The following are our current challenges: 
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Teleoperation technologies for complex humanlike robots 

Methods must be studied to teleoperate the geminoid to convey existence/ presence, 
which is much more complex than traditional teleoperation for mobile and industrial 
robots. We are studying a method to autonomously control an android by transferring 
motions of the operator measured by a motion capturing system. We are also 
developing methods to autonomously control eye-gaze and humanlike small and large 
movements. 

Synchronization between speech utterances sent by the teleoperation system and body 
movements 

The most important technology for the teleoperation system is synchronization between 
speech utterances and lip movements. We are investigating how to produce natural 
behaviors during speech utterances. This problem is extended to other modalities, such as 
head and arm movements. Further, we are studying the effects on non-verbal 
communication by investigating not only synchronization of speech and lip movements but 
also facial expressions, head, and even whole body movements. 

Psychological test for human existence/presence 

We are studying the effect of transmitting Sonzai-Kan from remote places, such as meeting 
participation instead of the person himself. Moreover, we are interested in studying 
existence/ presence through cognitive and psychological experiments. For example, we are 
studying whether the android can represent the authority of the person himself by 
comparing the person and the android. 

Application 

Although being developed as research apparatus, the nature of geminoids can allow us 
to extend the use of robots in the real world. The teleoperated, semi-autonomous 
facility of geminoids allows them to be used as substitutes for clerks, for example, that 
can be controlled by human operators only when non- typical responses are required. 
Since in most cases an autonomous AI response will be sufficient, a few operators will 
be able to control hundreds of geminoids. Also because their appearance and behavior 
closely resembles humans, in the next age geminoids should be the ultimate interface 
device. 
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1. Introduction 



Demand for robots is shifting from their use in industrial applications to their use in 
domestic situations, where they "live" and interact with humans. Such robots require 
sophisticated body designs and interfaces to do this. Humanoid robots that have multi- 
degrees-of-freedom (MDOF) have been developed, and they are capable of working with 
humans using a body design similar to humans. However, it is very difficult to intricately 
control robots with human generated, preprogrammed, learned behavior. Learned behavior 
should be acquired by the robots themselves in a human-like way, not programmed 
manually. Humans learn actions by trial and error or by emulating someone else's actions. 
We therefore apply reinforcement learning for the control of humanoid robots because this 
process resembles a human's trial and error learning process. 

Many existing methods of reinforcement learning for control tasks involve discrediting state 
space using BOXES (Michie & Chambers, 1968; Sutton & Barto, 1998) or CMAC 
(Albus, 1981) to approximate a value function that specifies what is advantageous in the 
long run. However, these methods are not effective for doing generalization and cause 
perceptual aliasing. Other methods use basis function networks for treating continuous state 
space and actions. 

Networks with sigmoid functions have the problem of catastrophic interference. They 
are suitable for off-line learning, but are not adequate for on-line learning such as that 
needed for learning motion (Boyan & Moore, 1995; Schaal & Atkeson, 1996). On the 
contrary, networks with radial basis functions are suitable for on-line learning. However, 
learning using these functions requires a large number of units in the hidden layer, 
because they cannot ensure sufficient generalization. To avoid this problem, methods of 
incremental allocation of basis functions and adaptive state space formation were 
proposed (Morimoto & Doya, 1998; Samejima & Omori, 1998; Takahashi et al., 1996; 
Moore & Atkeson, 1995). 

In this chapter, we propose a dynamic allocation method of basis functions called 
Allocation/ Elimination Gaussian Softmax Basis Function Network (AE-GSBFN), that is 
used in reinforcement learning to treat continuous high-dimensional state spaces. AE- 
GSBFN is a kind of actor-critic method that uses basis functions and it has allocation and 
elimination processes. In this method, if a basis function is required for learning, it is 
allocated dynamically. On the other hand, if an allocated basis function becomes redundant, 
the function is eliminated. This method can treat continuous high-dimensional state spaces 
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because the allocation and elimination processes reduce the number of basis functions 
required for evaluation of the state space. 
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Fig. 1. Actor-critic architecture. 
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Fig. 2. Basis function network. 



To confirm the effectiveness of our method, we used computer simulation to show how a 
humanoid robot learned two motions: a standing-up motion from a seated position on chair 
and a foot-stamping motion. 



2. Actor-Critic Method 

In this section, we describe an actor-critic method using basis functions, and we apply it to 
our method. 

Actor-critic methods are temporal difference (TD) methods that have a separate memory 
structure to explicitly represent the policy independent of the value function (Sutton & Barto, 
1998). Actor-critic methods are constructed by an actor and a critic, as depicted in Figure 1. The 
policy structure is known as the actor because it is used to select actions, and the estimated 
value function is known as the critic because it criticizes the actions made by the actor. 
The actor and the critic each have a basis function network for learning of continuous state 
spaces. Basis function networks have a three-layer structure as shown in Figure 2, and basis 
functions are placed in middle-layer units. Repeating the following procedure, in an actor- 
critic method using basis function networks, the critic correctly estimates the value function 
V(s), and then the actor acquires actions that maximize V(s). 

1. When state s(f) is observed in the environment, the actor calculates the j'-th value Uj(t) 
of the action u(t) as follows (Gullapalli, 1990): 
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MO = C x /£^(s(0)+Mol (i) 

where w™ ax is a maximal control value, N is the number of basis functions, b;(s(f)) is a 
basis function, ffl,.is a weight, rij(t) is a noise function, and g() is a logistic sigmoid 
activation function whose outputs lie in the range (-1, 1). The output value of actions is 
saturated into M ; max byg(). 

2. The critic receives the reward r(t), and then observes the resulting next state s(f+l). The 
critic provides the TD-error 5(f) as follows: 

S(t) = r(t) + r V(s(t + l))-V(s(t)), (2) 

where y is a discount factor, and V(s) is an estimated value function. Here, V(s(t)) is 
calculated as follows: 

N 

v(s(o) = 2>A(s(0), (3) 

i 

where Vi is a weight. 

3. The actor updates weight o,. using TD-error: 

w^m^m^WMt)), (4) 

where /? is a learning rate. 

4. The critic updates weight V{. 

Vj <—Vj +aS(t)e,, (5) 

where a is a learning rate, and e; is an eligibility trace. Here, e, is calculated as follows: 

e 1 <^yAe i +b(s(t)), (6) 

where X is a trace-decay parameter. 

5. Time is updated. 

t <- 1 + Af . (7) 

Note that At is 1 in general, but we used the description of Af for the control interval of the 
humanoid robots. 

3. Dynamic Allocation of Basis Functions 

In this chapter, we propose a dynamic allocation method of basis functions. This method is 
an extended application of the Adaptive Gaussian Softmax Basis Function Network (A- 
GSBFN) (Morimoto & Doya, 1998, 1999). A-GSBFN only allocates basis functions, whereas 
our method both allocates and eliminates them. In this section, we first briefly describe A- 
GSBFN in Section 3.1; then we propose our method, Allocation/ Elimination Gaussian 
Softmax Basis Function Network (AE-GSBFN), in Section 3.2. 

3.1 A-GSBFN 

Networks with sigmoid functions have the problem of catastrophic interference. They are 
suitable for off-line learning, but not adequate for on-line learning. In contrast, networks 
with radial basis functions (Figure 3) are suitable for on-line learning, but learning using 
these functions requires a large number of units, because they cannot ensure sufficient 
generalization. The Gaussian softmax functions (Figure 4) have the features of both sigmoid 
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functions and radial basis functions. Networks with the Gaussian softmax functions can 
therefore assess state space locally and globally, and enable learning motions of humanoid 
robots. 










. 



Fig. 3. Shape of radial basis functions. Four radial basis functions are visible here, but it is 
clear that the amount of generalization done is insufficient. 




Fig. 4. Shape of Gaussian softmax basis functions. Similar to Figure 3, there are four basis 
functions. Using Gaussian softmax basis functions, global generalization is done, such as 
using sigmoid functions. 

The Gaussian softmax basis function is used in A-GSBFN and is given by the following 
equation: 

"■(»(*)) 



b,(s(t))- 



N 
k 



'(*)) 



(8) 



where fl ( (s(f)) is a radial basis function, and N is the number of radial basis functions. Radial 
basis function 0>(s(f)) in the i-th unit is calculated by the following equation: 

1 



«,- (*(*)) = exp[ — ^||M(s( / ) - c, ) 



(9) 



where c, is the center of the i-th basis function, and M is a matrix that determines the shape 
of the basis function. 
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In A-GSBFN, a new unit is allocated if the error is larger than threshold <? max and the 
activation of all existing units is smaller than threshold a m j n : 

\W)\> S m a x and max fl;( s (0)< fl min> (1°) 

where h(t) is defined as h(t) = S(t)n At) at the actor, and h(t) = S(t) at the critic. The new 
unit is initialized with c, = s, and cj, = . 

3.2 Allocation/Elimination GSBFN 

To perform allocation and elimination of basis functions, we introduce three criteria into A- 
GSBFN: trace e i of activation of radial basis functions, additional control time rj , and 
existing time r l of radial basis functions. The criteria e t and r, are prepared for all basis 
functions, and rj is prepared for both actor and critic networks. A learning agent can gather 
further information on its own states by using these criteria. 
We now define the condition of allocation of basis functions. 

Definition 1 -Allocation 

A new unit is allocated at c, = s (£) if the following condition is satisfied at the actor or critic 
networks: 

|M0| ><y max and max «,( s (0)< fl min 

and 1>T !idd , (11) 

where T add is a threshold. 

■ 

Let us consider using condition (10) for allocation. This condition is only considered for 
allocation, but it is not considered as a process after a function is eliminated. Therefore, 
when a basis function is eliminated, another basis function is immediately allocated at the 
near state of the eliminated function. To prevent immediate allocation, we introduced 
additional control time rj into the condition of allocation. The value of rj monitors the 
length of time that has elapsed since a basis function was eliminated. Note that rj is 
initialized at 0, when a basis function is eliminated. 
We then define the condition of elimination using s t and r, . 

Definition 2 - Elimination 

The basis function b,(s(f)) is eliminated if the following condition is satisfied in the actor or 

critic networks. 

e t >e mm and r, >T crasc , (12) 

where e max and T crasc are thresholds. 

J 
The trace e t of the activation of radial basis functions is updated at each step in the 
following manner: 

El ^ KE ,+a,( S {t)), (13) 

where k is a discount rate. Using e l , the learning agent can sense states that it has recently 
taken. The value of e i takes a high value if the agent stays in almost the same state. This 
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situation is assumed when the learning falls into a local minimum. Using the value of e i , we 
consider how to avoid the local minimum. Moreover, using r l , we consider how to inhibit a 
basis function from immediate elimination after it is allocated. We therefore defined the 
condition of elimination using e t and r, . 




Fig. 5. Learning motion; standing up from a chair. 



4. Experiments 

4.1 Standing-up motion learning 

In this section, as an example of learning of continuous high-dimensional state spaces, AE- 
GSBFN is applied to a humanoid robot learning to stand up from a chair (Figure 5). The 
learning was simulated using the virtual body of the humanoid robot HOAP-1 made by 
Fujitsu Automation Ltd. Figure 6 shows HOAP-1. The robot is 48 centimeters tall, weighs 6 
kilograms, has 20 DOFs, and has 4 pressure sensors each on the soles of its feet. Additionally, 
angular rate and acceleration sensors are mounted in its chest. To simulate learning, we 
used the Open Dynamics Engine (Smith). 




Fig. 6. HOAP-1 (Humanoid for Open Architecture Platform). 

The robot is able to observe the following vector s(f) as its own state: 

8(0 = K> ^,o K , o K , o A , e A , e p , o P ), (14) 

where W , K , and A are waist, knee, and ankle angles respectively, and P is the pitch 
of its body (see Figure 5). Action u(f) of the robot is determined as follows: 

u(t) = {0 w ,0 K ,0 A ), (15) 
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One trial ended when the robot fell down or time exceeded t to tal = 10 [s]. Rewards r(t) were 
determined by height y [cm] of the robot's chest: 

'stand — V 



T(t): 



-20 



'stand 'down 



(during trial) 



- 20 | f totai- f | (on failure) 

where /stand = 35 [cm] is the position of the robot's chest in an upright posture, and /down = 20 
[cm] is its center in a falling-down posture. We used uJ Lax =/r/36, y = 0.9, /? = 0.1 , 

a =0.02, X = 0.6 , and At = 0.01 [s] for parameters in Section 2, M=(1.0, 0.57, 1.0, 0.57, 1.0, 
0.57, 1.0, 0.57), £ max = 0.5 , and a min = 0.4 in Section 3.1, and T add = 1 [s], k = 0.9 , s max = 5.0 , 
and T crasc = 3 [s] in Section 3.2. 

Figure 7 shows the learning results. First, the robot learned to fall down backward, as shown 
in i). Second, the robot intended to stand up from a chair, but fell forward, as shown in ii), 
because it could not yet fully control its balance. Finally, the robot stood up while 
maintaining its balance, as shown in iii). The number of basis functions in the 2922nd trial 
was 72 in both actor and critic networks. Figure 8 shows the experimental result with the 
humanoid robot HOAP-1. The result shows that HOAP-1 was able to stand up from a chair, 
as in the simulation. 

We then compared the number of basis functions in AE-GSBFN with the number of basis 
functions in A-GSBFN. Figure 9 shows the number of basis functions of the actor, averaged 
over 20 repetitions. In these experiments, motion learning with both AE-GSBFN and A- 
GSBFN was successful, but the figure indicates that the number of basis functions required 
by AE-GSBFN was fewer than that by A-GSBFN. That is, high dimensional learning is 
possible using AE-GSBFN. Finally, we plotted the height of the robot's chest in successful 
experiments in Figures 10 and 11. In the figures, circles denote a successful stand-up. The 
results show that motion learning with both AE-GSBFN and A-GSBFN was successful. 

4.2 Stamping motion learning 

In Section 4.1, we described our experiment with learning of transitional motion. In this 
section, we describe our experiment with periodic motion learning. We use a stamping 
motion as a periodic motion (Figure 12). Periodic motions, such as locomotion, are difficult 
to learn only through reinforcement learning, so in many cases, a Central Pattern Generator 
(CPG), etc., is used in addition to reinforcement learning (e.g., Mori et al., 2004). In this 
experiment, we use inverse kinematics and AE-GSBFN to obtain a stamping motion. 

Inverse kinematics calculates the amount of change of each joint angle from the amount 
P of change of the coordinates of a link model: 

= J'(0)P, (17) 

where ](0) is the Jacobian matrix. Generally, since the dimension of differs from the dimension 
of P , ](0) does not become a regular matrix, and its inverse matrix cannot be calculated. 
Moreover, even if it could be calculated, a motion resolution by J~ (0) cannot be performed in the 
neighborhood of singular points, which are given by around det ](0) = . To solve these 
problems, we used the following function (Nakamura & Hanafusa, 1984) in this section: 

= J T (JJ T +kjy'P, (18) 
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i) 300 th trial 



t. 



ii) 1031st trial 






iii) 1564* trial 




Fig. 7. Learning results. 
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Fig. 8. Experimental result with HOAP-1. 
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Fig. 9. Number of basis functions in the actor network (averaged over 20 repetitions). 
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Tnsis 
Fig. 10. Height of robot's chest with AE-GSBFN. Circles denote successful stand-ups. 

where k s is a scalar function with which it becomes a positive value near singular points and 
becomes otherwise: 



*b|l- 



(19) 



(w < w ) 

(otherwise) 

where fco is a positive parameter, wo is a threshold that divides around singular points from 



the others, and w is given by w = yjdet](0)J T (0) 



In this experiment, the coordinate of the end of the legs is given by inverse kinematics (i.e., 
up-down motion of the legs is given), and motion of the horizontal direction of the waist is 
learned by AE-GSBFN. The coordinate value was acquired from the locomotion data of 
HOAP-1. Concretely, motion is generated by solving inverse kinematics from p w to the 
idling leg, and from the supporting leg to p w (Figure 12 (a)). The change of supporting and 
idling legs is also acquired from HOAP-l's locomotion data. 
M 
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Fig. 11. Height of robot's chest with A-GSBFN. Circles denote successful stand-ups. 
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Right Leg! 

(a) Frontal plane 
Fig. 12. Learning motion; stamping the foot 



(b) Sagittal plane 



The robot is able to observe the following vector s(f) as its own state: 



'WF i "WF i U WF i W WF 
l(R) MR) ML) ML) 



|(R) MR) ML) ML) n 
AF ' U AF ' U AF ' U AF ' a PF ' 



s(t) = (0^ * (R) * (1) * (L) 

, " ,', " , ; ,/"' o (i<) f) (L) o (L) o '> ) 

J WS' u WS' w WS' u WS' ,J K ' a K i°Y. ' a K i" PS '"PS I' 



(20) 



where 6^ F (right leg: { „> , left leg: 0^> ) and { A ' F (right leg: A K F > , left leg: 0^ ) are angles of 
the waist and ankle about the roll axis, respectively, and PF is the pitch of its body about 
the roll axis. Also 9$ s (right leg: 0$ , left leg: 9$ s ) and 9$ (right leg: { K R) , left leg: { K L) ) 
are angles of the waist and knee about the pitch axis, respectively, and # ps is the pitch of its 

body about the pitch axis. Note that the angle of the ankle of each leg about the pitch axis 
was controlled to be parallel to the ground. 
Action u(f) of the robot is determined as follows: 

u(t) = p(t), (21) 

where p(t) is the amount of change of p(t) which is the position of the center of the robot's 
waist. Note that the value of p(t) is a y-coordinate value, and does not include x- or z- 
coordinate values. 

One trial terminated when the robot fell down or time exceeded f to tai = 17.9 [s]. Rewards r(t) 
were determined by the following equation: 

20\p(t) - p(0)\ (during trial) 



m- 



-20*.. 



-t\ 



(on failure) 



(22) 



We can use the value of the difference between its supporting leg and p(t) as rewards, 
but these rewards may represent the ideal position of p(t) because of the use of inverse 
kinematics. Therefore, we used the above equation. Using the equation (22), the closer 
p(t) is to p(0) , the more the rewards increases. Intuitively, it is unsuitable for rewards 
of stamping motion learning, but acquiring a stamping motion only brings more 
rewards, because an up-down motion of the leg is given forcibly by inverse kinematics, 
and it is necessary to change p(t) quite a lot to make the robot stay upright without 
falling down. 
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We used it"™ = l.OxKT 4 , y = 0.9 , ,0 = 0.1 , a =0.02 , A = 0.6 , and A? = 0.01 [s] for 

parameters in Section 2, M=diag(2.0, 1.1, 2.0, 1.1, 2.0, 1.1, 2.0, 1.1, 2.0, 1.1, 2.0, 1.1, 2.0, 1.1, 2.0, 
1.1, 2.0, 1.1, 2.0, 1.1), <5 max =0.5, and a mm =0.135 in Section 3.1, and T add =1.0 [s], r- = 0.9, 
s ma =5.0, and T crase =2.0 [s] in Section 3.2. We also used k = 0.01 and iy =0.003 for the 
parameters of inverse kinematics. 

Figure 13 shows the learning results. The robot can control its balance by moving its 
waist right and left. Figure 14 plots the amount of time taken to fall down. You can 
see that the time increases as the learning progresses. Figure 15 shows the value of 
p(t) in the 986 th trial. It is clear that p(f) changes periodically. These results indicate 

that a stamping motion was acquired, but the robot's idling leg does not rise perfectly 
when we look at the photos in Figure 13. We assume that the first reason for these 
results is that it is difficult to control the angle of ankle using inverse kinematics 
(since inverse kinematics cannot control O^J and 0\) to be parallel to the ground). 
The second reason is that we only used y-coordinate values of the waist for learning, 
and the third is because we used equation (22) for rewards. To solve the second issue, 
we can use its z-coordinate value. Using equation (22), the third reason, a small 
periodic motion is obtained (Figure 16). To solve this problem, we should consider 
another reward function for this experiment. We will explore these areas in our future 
research. 
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Fig. 13. Simulation result (986 th trial). 
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Fig. 14. Time during trial (averaged over 100 repetitions). If the value of a vertical axis is 
large, the stamping motion extends for a long time. 




Fig. 15. Position of p(t) in horizontal direction in 986 th trial. 
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Fig. 16. Ideal angle and output angle of 0$ with AE-GSBFN in 986* trial. The dotted line 
indicates an ideal motion and the solid line indicates the acquired motion with AE-GSBFN. It is 
clear that the acquired motion consists of small periodic motions compared with the deal motion. 
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5. Conclusion 

In this chapter, we proposed a dynamic allocation method of basis functions, AE-GSBFN, in 
reinforcement learning. Through allocation and elimination processes, AE-GSBFN 
overcomes the curse of dimensionality and avoids a fall into local minima. To confirm the 
effectiveness of AE-GSBFN, we applied it to the motion control of a humanoid robot. We 
demonstrated that AE-GSBFN is capable of providing better performance than A-GSBFN, 
and we succeeded in enabling the learning of motion control of the robot. 
The future objective of this study is to do some general comparisons of our method with 
other dynamic neural networks, for example, Fritzke's "Growing Neural Gas" (Fritzke, 
1996) and Marsland's "Grow When Required Nets" (Marsland et al., 2002). An analysis of 
the necessity of hierarchical reinforcement learning methods proposed by Morimoto and 
Doya (Morimoto & Doya, 2000) in relation to the standing up simulation is also an 
important issue for the future study. 
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1. Introduction 



Many aspects of modern life involve the use of intelligent machines capable of operating 
under dynamic interaction with their environment. In view of this, the field of biped 
locomotion is of special interest when human-like robots are concerned. Humanoid robots as 
anthropomorphic walking machines have been in operation for more than twenty years. 
Currently, research on the design and the humanoid robots are one of the most exciting and 
challenging topics in the field of robotics. .The potential applications of this research area are 
very foremost in the middle and long term. Humanoid robots are expected to be servants 
and maintenance machines with the main task to assist human activities in our daily life and 
to replace humans in hazardous operations. It is as obvious as interesting that 
anthropomorphic biped robots are potentially capable to effectively move in all 
unstructured environments where humans do. There also raises strong anticipations that 
robots for the personal use will coexist with humans and provide supports such as the 
assistance for the housework, care of the aged and the physically handicapped. 
Consequently, humanoid robots have been treated as subjects of robotics researches such as 
a research tool for human science, an entertainment/ mental-commit robot or an 
assistant/ agent for humans in the human living environment. 

Humanoid robot are autonomous systems capable of extracting information from their 
environments and using knowledge about the world and intelligence of their duties and 
proper governing capabilities. Intelligent humanoid robots should be autonomous to move 
safely in a meaningful and purposive manner, i.e. to accept high-level descriptions of tasks 
(specifying what the user wants to be done, rather than how to do it) and would execute 
them without further human intervention. Future humanoid robots are likely to have 
greater sensory capabilities, more intelligence for valid reasoning and decision making, 
higher levels of manual dexterity and adequate mobility as compared to humans. Naturally, 
the first approach to making humanoid robots more intelligent was the integration of 
sophisticated sensor systems as computer vision, tactile sensing, ultrasonic and sonar 
sensors, laser scanners and other smart sensors. However, today's sensor products are still 
very limited in interactivity and adaptability to changing environments. As the technology 
and algorithms for real-time 3D vision and tactile sensing improve, humanoid robots will be 
able to perform tasks that involve complex interaction with the environment (e.g. grasping 
and manipulating the objects). A major reason is that uncertainty and dynamic changes 
make the development of reliable artificial systems particularly challenging. On the other 
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hand, to design robots and systems that best adapt to their environment, the necessary 
research includes investigations in the field of mechanical robot design (intelligent 
mechanics), environment perception systems and embedded intelligent control that ought to 
cope with the task complexity, multi-objective decision making, large volume of perception 
data and substantial amount of heuristic information. Also, in the case when the robot 
performs in an unknown environment, the knowledge may not be sufficient. Hence, the 
robot has to adapt to the environment and to be capable of acquiring new knowledge 
through the process of learning. The robot learning is essentially concerned with equipping 
robots with the capacity of improving their behavior over time, based on their incoming 
experiences. 

Although there has been a large number of the control methods used to solve the problem 
of humanoid robot walking , it is difficult to detect a specific trend. Classical robotics and 
also the more recent wave of humanoid and service robots still rely heavily on teleoperation 
or fixed behavior-based control with very little autonomous ability to react to the 
environment. Among the key missing elements is the ability to create control systems that 
can deal with a large movement repertoire, variable speeds, constraints and most 
importantly, uncertainty in the real-world environment in a fast, reactive manner. There are 
several intelligent paradigms that are capable of solving intelligent control problems in 
humanoid robotics. Connectionist theory (NN - neural networks), fuzzy logic (FL), and 
theory of evolutionary computation (GA - genetic algorithms), are of great importance in 
the development of intelligent humanoid robot control algorithms (Katie & Vukobratovic, 
2003a; Katie & Vukobratovic, 2003b; Katie & Vukobratovic, 2005). Due to their strong 
learning and cognitive abilities and good tolerance of uncertainty and imprecision, 
intelligent techniques have found wide applications in the area of advanced control of 
humanoid robots. Also, of great importance in the development of efficient algorithms are 
the hybrid techniques based on the integration of particular techniques such as neuro-fuzzy 
networks, neuro-genetic algorithms and fuzzy-genetic algorithms. 

One approach of departing from teleoperation and manual 'hard coding' of behaviors is by 
learning from experience and creating appropriate adaptive control systems. A rather 
general approach to learning control is the framework of Reinforcement Learning, described in 
this chapter. Reinforcement learning offers one of the most general framework to take 
traditional robotics towards true autonomy and versatility. 

Reinforcement learning typically requires an unambiguous representation of states and 
actions and the existence of a scalar reward function. For a given state, the most traditional of 
these implementations would take an action, observe a reward, update the value function, and 
select as the new control output the action with the highest expected value in each state (for a 
greedy policy evaluation). Updating of value function and controls is repeated until 
convergence of the value function and/ or the policy. This procedure is usually summarized 
under "value update - policy improvement" iterations. The reinforcement learning paradigm 
described above has been successfully implemented for many well-defined, low dimensional 
and discrete problems (Sutton & Barton, 1998; Bertsekas & Tsitsiklis, 1996) and has also 
yielded a variety of impressive applications in rather complex domains in the last 20 years. 
Reinforcement learning is well suited to training mobile robots, in particular teaching a robot a 
new behavior (e.g. avoid obstacles) from scalar feedback. Robotics is a very challenging 
domain for reinforcement learning, However, various pitfalls have been encountered when 
trying to scale up these methods to high dimension, continuous control problems, as typically 
faced in the domain of humanoid robotics. 
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2. Control Problem in Humanoid Robotics 

In spite of a significant progress and accomplishments achieved in the design of a hardware 
platform of humanoid robot and synthesis of advanced intelligent control of humanoid 
robots, a lot of work has still to be done in order to improve actuators, sensors, materials, 
energy accumulators, hardware, and control software that can be utilized to realize user- 
friendly humanoid robots. We are still in an initial stage when the understanding of the 
motor control principles and sensory integration subjacent to human walking is concerned. 
Having in mind the very high requirements to be met by humanoid robots, it is necessary to 
point out the need for increasing the number of degrees of freedom (DOFs) of their 
mechanical configuration and studying in detail some previously unconsidered phenomena 
pertaining to the stage of forming the corresponding dynamic models. Besides, one should 
emphasize the need for developing appropriate controller software that would be capable of 
meeting the most complex requirements of accurate trajectory tracking and maintaining 
dynamic balance during regular (stationary) gait in the presence of small perturbations, as 
well as preserving robot's posture in the case of large perturbations. Finally, one ought to 
point out that the problem of motion of humanoid robots is a very complex control task, 
especially when the real environment is taken into account, requiring as a minimum, its 
integration with the robot's dynamic model. 

There are various sources of control problems and various tasks and criteria that must be 
solved and fulfilled in order to create valid walking and other functions of humanoid 
robots. Previous studies of biological nature, theoretical and computer simulation, have 
focussed on the structure and selection of control algorithms according to different criteria 
such as energy efficiency, energy distribution along the time cycle, stability, velocity, 
comfort, mobility, and environment impact. Nevertheless, in addition to these aspects, it is 
also necessary to consider some other issues: capability of mechanical implementation due 
to the physical limitations of joint actuators, coping with complex highly-nonlinear 
dynamics and uncertainties in the model-based approach, complex nature of periodic and 
rhythmic gait, inclusion of learning and adaptation capabilities, computation issues, etc. 
The major problems associated with the analysis and control of bipedal systems are the 
high-order highly-coupled nonlinear dynamics and furthermore, the discrete changes in the 
dynamic phenomena due to the nature of the gait. Irrespective of the humanoid robot 
structure and complexity, the basic characteristic of all bipedal systems are: a) the DOF 
formed between the foot and the ground is unilateral and underactuated ; b) the gait 
repeatability (symmetry) and regular interchangeability of the number of legs that are 
simultaneously in contact with the ground. During the walk, two different situations arise 
in sequence: the statically stable double-support phase in which the mechanism is 
supported on both feet simultaneously, and statically unstable single-support phase when 
only one foot of the mechanism is in contact with the ground. Thus, the locomotion 
mechanism changes its structure during a single walking cycle from an open to a closed 
kinematic chain. Also, it is well known that through the process of running the robot can be 
most of the time in no-support phase. In this case, the control schemes that are successful for 
walking problem are not necessarily successful for the running problem. All the mentioned 
characteristics have to be taken into account in the synthesis of advanced control algorithms 
that accomplish stable, fast and reliable performance of humanoid robots. 
The stability issues of humanoid robot walking are the crucial point in the process of control 
synthesis. In view of this humanoid walking robots can be classified in three different 
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categories. First category represents static walkers, whose motion is very slow so that the 
system's stability is completely described by the normal projection of the Centre of Gravity, 
which only depends on the joint's position. Second category represents dynamic walkers, 
biped robots with feet and actuated ankles. Postural stability of dynamic walkers depends 
on joint's velocities and acceleration too. These walkers are potentially able to move in a 
static way provided they have large enough feet and the motion is slow. The third category 
represents purely dynamic walkers, robots without feet. In this case the support polygon 
during the single-support phase is reduced to a point, so that static walking is not possible. 
In the walk with dynamic balance, the projected centre of mass is allowed outside of the 
area inscribed by the feet, and the walker may essentially fall during parts of the walking 
gait. The control problems of dynamic walking are more complicated than in walking with 
static balance, but dynamic walking patterns provide higher walking speed and greater 
efficiency, along with more versatile walking structures. 

The rotational equilibrium of the foot is the major factor of postural instability with legged 
robots. The question has motivated the definition of several dynamic-based criteria for the 
evaluation and control of balance in biped locomotion. The most common criterion are the centre 
of pressure (CoP), the zero-moment point (ZMP) concept, that has gained widest acceptance and 
played a crucial role in solving the biped robot stability and periodic walking pattern synthesis 
(Vukobratovic and Juricic, 1969). The ZMP is defined as the point on the ground about which the 
sum of all the moments of the active forces equals zero. If the ZMP is within the convex hull of all 
contact points between the foot and the ground, the biped robot can walk. 

For a legged robot walking on complex terrain, such as a ground consisting of soft and hard 
uneven parts, a statically stable walking manner is recommended. However, in the cases of 
soft terrain, up and down slopes or unknown environment, the walking machine may lose 
its stability because of the position planning errors and unbalanced foot forces. Hence, 
position control alone is not sufficient for practical walking, position/force control being 
thus necessary. Foot force control can overcome these problems, so that foot force control is 
one of the ways to improve the terrain adaptability of walking robots. For example, in the 
direction normal to the ground, foot force has to be controlled to ensure firm foot support 
and uniform foot force distribution among all supporting legs; foot force in the tangential 
direction has to be monitored to avoid slippage. 

A practical biped needs to be more like a human - capable of switching between different 
known gaits on familiar terrain and learning new gaits when presented with unknown 
terrain. In this sense, it seems essential to combine force control techniques with more 
advanced algorithms such as adaptive and learning strategies. Inherent walking patterns 
must be acquired through the development and refinement by repeated learning and 
practice as one of important properties of intelligent control of humanoid robots. Learning 
enables the robot to adapt to the changing conditions and is critical to achieving 
autonomous behaviour of the robot. 

Many studies have given weight to biped walking which is based only on stability of the 
robot: steady-state walking, high-speed dynamic walking, jumping, and so on. A humanoid 
robot is however, a kind of integrated machine: a two-arm and two-leg mechanism. Hence, 
we must not only focus on the locomotion function but also on arm's function with this kind 
of machines; manipulation and handling being major functions of robot's arms. 
When the ground conditions and stability constraint are satisfied, it is desirable to select a 
walking pattern that requires small torque and velocity of the joint actuators. Humanoid 
robots are inevitably restricted to a limited amount of energy supply. It would therefore be 
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advantageous to consider the minimum energy consumption, when cyclic movements like 
walking are involved. With this in mind, an important approach in research is to optimise 
simultaneously both the humanoid robot morphology and control, so that the walking 
behaviour is optimised instead of optimising walking behaviour for the given structure of 
humanoid robot. Optimum structures can be designed when the suitable components and 
locomotion for the robot are selected appropriately through evolution. It is well known that 
real-time generation of control algorithms based on highly-complex nonlinear model of 
humanoid robot commonly suffers from a large amount of computation. Hence, new time- 
efficient control methods need to be discovered to control humanoid robots in real time, to 
overcome the mentioned difficulty. 

In summary, conventional control algorithms is based on a kinematics and dunamic 
modeling of the mechanism structure. (Vukobraovic et al, 1990). This implies precise 
identification of intrinsinc parameters of biped's robot mechanism , requires a high precise 
measurement of humanoid state variables and needs for precise evaluation of interaction 
forces between foot and ground. Moreover, these methods require a lot of computation 
together with some problems related to mathematical tractability, optimisation, limited 
extendability and limited biological plausibility. The second approach based on intelligent 
control techniques have a potential to overcome the mentioned constraints. In this case, it is 
not necessary to know perfectly the parameters and characteristics of humanoid 
mechanism. Also, these methods take advantage from off-line and on-line learning 
capabilities. This last point is very important because generally the learning ability allows 
increasing the autonomy of the bioed robot. 

3. Reinforcement Learning Framework and Reinforcement Learning 
Algoritgms in Humanoid Robotics 

Recently, reinforcement learning has attracted attention as a learning method for studying 
movement planning and control (Sutton & Barton, 1998; Bertsekas & Tsitsiklis, 1996). 
Reinforcement learning is a kind of learning algorithm between supervised and 
unsupervised learning algorithms which is based on Markov decision process (MDP). 
Reinforcement learning concept is based on trial and error methodology and constant 
evaluation of performance in constant interaction with environment. 

In many situations the success or failure of the controller is determined not only by one 
action but by a succession of actions. The learning algorithm must thus reward each action 
accordingly. This is referred to as the problem of delayed reward. There are two basic 
methods that are very successful in solving this problem, TD learning (Sutton & Barto, 1998) 
and Q learning (Watkins & Dayan, 1992). Both methods build a state space value function 
that determines how close each state is to success or failure. Whenever the controller 
outputs an action, the system moves from one state to an other. The controller parameters 
are then updated in the direction that increases the state value function. 

For the solution of large-scale MDPs or continuous state and action spaces, it's impossible 
for reinforcement learning agent to go through all the states and actions. In order to realize 
the optimal approximation for value functions of continuous states and actions respectively, 
therefore, learning agent must have generalization ability. In other words, such an agent 
should be able to utilize finite learning experience to acquire and express a good knowledge 
of a large-scale space effectively. How to design a function approximator with abilities of 
high generalization and computation efficiency has become a key problem for the research 
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field of reinforcement learning. Using prior knowledge about the desired motion can greatly 
simplify controller synthesis. Imitation-based learning or learning from demonstration 
allow for policy search to focus only on the areas of the search space that pertain to the task 
at hand. Both, model-based and model-free approaches exist to find optimal policies when 
agents are allowed to act for unlimited time. For physical agents, such as humanoid robots 
acting in the real world, it is much more difficult to gain experience. Hence, the exhaustive 
exploration of highdimensional state and action spaces is not feasible. For a physical robot, 
it is essential to learn from few trials in order to have some time left for exploitation. 
The robot learning is essentially concerned with equipping robots with the capacity of 
improving their behavior over time, based on their incoming experiences. For instance, it 
could be advantageous to learn dynamics models, kinematic models, impact models, for 
model-based control techniques. Imitation learning could be employed for the teaching of 
gaits patterns, and reinforcement learning could help tuning parameters of the control 
policies in order to improve the performance with respect to given cost functions. 
Dynamic bipedal walking is difficult to learn for a number of reasons. First, biped robots 
typically have many degrees of freedom, which can cause a combinatorial explosion for 
learning systems that attempt to optimize performance in every possible configuration of 
the robot. Second, details of the robot dynamics such as uncertainties in the ground contact 
and nonlinear friction in the joints must be only experimentally validated. Since it is only 
practical to run a small number of learning trials on the real robot, the learning algorithms 
must perform well after obtaining a very limited amount of data. Finally, learning 
algorithms for dynamic walking must deal with dynamic discontinuities caused by 
collisions with the ground and with the problem of delayed reward -torques applied at one 
time may have an effect on the performance many steps into the future. 

In area of humanoid robotics, there are several approaches of reinforcement learning (Benbrahim 
& Franklin, 1997; Chew & Pratt, 2002; Li et al. , 1992; Mori et al„ 2004; Morinoto et al., 2004; 
Nagasaka et al, 1999; Nakamura et al, 2003; Salatian et al, 1997; Zhou & Meng, 2000) with 
additional demands and requirements because high dimensionality of the control problem. 
In the paper (Benbrahim & Franklin, 1997), it is shown how reinforcement learning is used 
within a modular control architecture to enable a biped robot to walk. The controller structure 
consists of central (CPG) and peripheral controllers. The learning architecture succeeded in 
dealing with the problems of large numbers of inputs, knowledge integration and task 
definition. The central controller controls the robot in nominal situations, and the peripheral 
controllers intervene only when they consider that the central controller's action contradicts their 
individual control policies (Figure 1). The action is generated by computing the average of the 
outputs of all controllers that intervene including the central controller. Each peripheral 
controller's role is to correct the central controller's mistakes and issue an evaluation of the 
general behaviour. The central controller then uses the average of all evaluations to learn a 
control policy that accommodates the requirements of as many peripheral controllers as possible. 
The central controller as well as some of the peripheral controllers in this study use 
adaptive CMAC neural networks. Because of modular nature, it is possible to use several 
neural networks with small numbers of inputs instead of one large neural network. This 
dramatically increases the learning speed and reduces the demand on memory and 
computing power. The architecture also allows easy incorporation of any knowledge by 
adding a peripheral controller that represents that knowledge The CPG uses reinforcement 
learning in order to learn an optimal policy. The CMAC weights are updated using the 
reinforcement signals received from the peripheral controllers. Reinforcement learning is 
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well suited for this kind of application. The system can try random actions and choose 
those that yield good reinforcement. The reinforcement learning algorithm uses the actor- 
critic configuration. It searches the action space using a Stochastic Real Valued (SRV) unit 
at the output. The reinforcement signal is generated using TD Learning. The CMAC neural 
networks used in the biped's learning are pre-trained using a biped walking robot simulator 
and predefined simple walking gates. The size of the search domain is determined by the 
standard deviation of the Gaussian unit. If the standard deviation is too small, the system 
will have a very small search domain. This decreases the learning speed and increases the 
system's vulnerability to the local minima problem. If the factor is too large, the system's 
performance will not reach its maximum because there will always be a randomness even if 
the system has learned an optimal solution. It is in general safer to use a large factor than a 
small one. Even though this learning algorithms and architecture have successfully solved 
the problem of dynamic biped walking, there are many improvements that can be added to 
increase learning speed, robustness, and versatility. The performance must also be 
improved by dynamically setting the PID gains to deal with each specific situation. 




► Joint angles 



Fig.l. Controller Architecture Benbrahim & Franklin. 

More recently, (Kun & Miller III, 1999) has developed a hierarchical controller that combines 
simple gait oscillators, classical feedback control techniques and neural network learning, and 
does not require detailed equations of the dynamics of walking. The emphasis is on the real- 
time control studies using an experimental ten-axis biped robot with foot force sensors. The 
neural network learning is achieved using CMAC controller, where CMAC neural networks 
were used essentially as context sensitive integral errors in the controller, and the control 
context being defined by the CMAC input vector. There are 3 different CMAC neural 
networks for humanoid posture control. The Front/ Back Balance CMAC neural network was 
used to provide front/ back balance during standing, swaying and walking. The training of 
this network is realized using data from foot sensors. The second CMAC neural network is 
used for Right/ Left Balance, to predict the correct knee extension required to achieve 
sufficient lateral momentum for lifting the corresponding foot for the desired length of time. 
The training of this network is realized using temporal difference method based on the 
difference between the desired and real time of foot rising. The third CMAC network is used 
to learn kinematically consistent robot postures. In this case, training is also realized by data 
from foot sensors. The results indicated that the experimental biped was able to learn the 
closed-chain kinematics necessary to shift body weight side-to-side while maintaining good 
foot contact. Also, it was able to learn the quasi-static balance required to avoid falling forward 
or backward while shifting body weight side-to-side at different speeds. It was able to learn 
the dynamic balance in order to lift a foot off the floor for a desired length of time and different 
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initial conditions. There were, however, many limitations (limited step length, slow walking, 
no adaptation for left-right balance, no possibility of walking on sloping surfaces). The new 
dynamically balanced scheme for handling variable-speed gait was proposed based on the 
preplanned but adaptive motion sequences in combination with closed-loop reactive control. 
This allows the algorithm to improve the walking performance over consecutive steps using 
adaptation, and to react to small errors and disturbances using reactive control. New sensors 
(piezoresistive accelerometers and two solid-state rate gyroscopes) are mounted on the new 
UNH biped (Fig. 2). 




Fig. 2. The UNH biped walking. 

Training of neural networks is realized through the process of temporal difference learning 
using information about ZMP from robot foot sensors. The CM AC neural networks were 
first trained during repetitive foot-lift motion similar to marching in place. Then, training 
was carried out during the attempts at walking for increased step length and gait speeds. 
The experimental results indicate that the UNH biped robot can walk with forward 
velocities in the range of 21 - 72 cm/min, with sideways leaning speed in the range of 3.6 - 
12.5 cm/s. The main characteristic of this controller is the synthesis of the control signal 
without dynamic model of biped. The proposed controller could be used as a basis for 
similar controllers of more complex humanoid robots in the future research. However, this 
controller is not of a general nature, because it is suitable only for the proposed structure of 
biped robot and must be adapted for the bipeds with different structures. More research 
efforts are needed to simplify the controller structure, to increase the gait speed, and to 
ensure stability of dynamic walking. 

The policy gradient method is one of the reinforcement learning methods successfully 
applied to learn biped walking on actual robots (Benbrahim & Franklin, 1997; Tedrake et al., 
2004). However, this method requires hours to learn a walking controller, and approach in 
(Tedrake et al., 2004) requires a mechanically stable robot. 

There are direct (model-free) and model-based reinforcement learning (Doya, 2000; Sutton 
& Barto, 1998) . The direct approach to RL is to apply policy search directly without learning 
a model of the system. In principle, the model-based reinforcement learning is more data 
efficient than direct reinforcement learning. Also, it was concluded that model-based 
reinforcement learning finds better trajectories, plans and policies, and handles changing 
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goals more efficiently. On the other hand, reported that a model-based approach to 
reinforcement learning is able to accomplish given tasks much faster than without using 
knowledge of the environment. 

The problem of biped gait synthesis using the reinforcement learning with fuzzy evaluative 
feedback is considered in paper (Zhoy & Meng, 2000). As first, initial gait from fuzzy rules is 
generated using human intuitive balancing scheme. Simulation studies showed that the 
fuzzy gait synthesizer can only roughly track the desired trajectory. A disadvantage of the 
proposed method is the lack of practical training data. In this case there are no numerical 
feedback teaching signal, only evaluative feedback signal exists (failure or success), exactly 
when the biped robot falls (or almost falls) down. Hence, it is a typical reinforcement 
learning problem. The dynamic balance knowledge is accumulated through reinforcement 
learning constantly improving the gait during walking. Exactly, it is fuzzy reinforcement 
learning that uses fuzzy critical signal. For human biped walk, it is typical to use linguistic 
critical signals such as "near-fall-down", "almost-success", "slower", "faster", etc. In this 
case, the gait synthesizer with reinforcement learning is based on a modified GARIC 
(Generalized Approximate Reasoning for Intelligent Control) method. This architecture of 
gait synthesizer consists of three components: action selection network (ASN), action 
evaluation network (AEN), and stochastic action modifier (SAM) (Fig. 3) The ASM maps a 
state vector into a recommended action using fuzzy inference. The training of ASN is 
achieved as with standard neural networks using error signal of external reinforcement. The 
AEN maps a state vector and a failure signal into a scalar score which indicates the state 
goodness. It is also used to produce internal reinforcement. The SAM uses both 
recommended action and internal reinforcement to produce a desired gait for the biped. The 
reinforcement signal is generated based on the difference between desired ZMP and real 
ZMP in the x-y plane. In all cases, this control structure includes on-line adaptation of gait 
synthesizer and local PID regulators. The approach is verified using simulation 
experiments. In the simulation studies, only even terrain for biped walking is considered, 
hence the approach should be verified for irregular and sloped terrain. 



Faflure signal Rxl 




Failure signal Ryl 



Fig. 3. The architecture of the reinforcement learning based gait synthesizer. 
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9 d , 



Q d are the desired joint angles of the 



where Xzmp,Yzmp are the ZMP coordinates; Q d 

biped gait. 

There are some research works that include the application of reinforcement learning 
control algorithms for passive or semi-passive biped walkers (Fig.4) (Tedrake et al., 2004; 
Morimoto et al, 2005; Schuitema et al., 2005). 




Fig. 4. Simple Passive Dynamic Walker. 

In (Schuitema et al. , 2005), Reinforcement Learning passive walker is proposed as model 
free approach with fully driven optimization method. This approach is adaptive, in the 
sense that when the robot or its environment changes without notice, the controller can 
adapt until performance is again maximal. Optimization relatively easily towards several 
goals, such as: minimum cost of transport, largest average forward speed, or both. Statistical 
learning algorithm makes small changes in control parameters and uses correlations 
between control changes and system error changes. Stochasticity is added to deterministic 
feedback control policy. Gradient following algorithm based on temporal difference error 
was applied. 

In paper (Morimoto et al., 2004) a model-based reinforcement learning algorithm for 
biped walking in which the robot learns to appropriately place the swing leg was 
proposed. This decision is based on a learned model of the Poincare map of the periodic 
walking pattern. The model maps from a state at the middle of a step and foot placement 
to a state at next middle of a step. Actor-Critic algorithms of reinforcement learning has a 
great potential in control of biped robots. For a example, a general algorithm for 
estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced in paper 
(Peters et al., 2003). This algorithm converges to the nearest local minimum of the cost 
function with respect to the Fisher in formation metric under suitable conditions. It offers 
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a promising route for the development of reinforcement learning for truly high- 
dimensionally continuous state-action systems. In paper (Tedrake et al., 2004) a learning 
system which is able to quickly and reliably acquire a robust feedback control policy for 
3D dynamic walking from a blank-slate using only trials implemented on physical robot. 
The robot begins walking within a minute and learning converges in approximately 20 
minutes. This success can be attributed to the mechanics of our robot, which are modelled 
after a passive dynamic walker, and to a dramatic reduction in the dimensionality of the 
learning problem. The reduction of the dimensionality was realized by designing a robot 
with only 6 internal degrees of freedom and 4 actuators, by decomposing the control 
system in the frontal and sagittal planes, and by formulating the learning problem on the 
discrete return map dynamics. A stochastic policy gradient algorithm to this reduced 
problem was applied with decreasing the variance of the update using a state-based 
estimate of the expected cost. This optimized learning system works quickly enough that 
the robot is able to continually adapt to the terrain as it walks. The learning on robot is 
performed by a policy gradient reinforcement learning algorithm (Baxter & Bartlett, 2001; 
Kimura & Kobayashi, 1998; Sutton et al., 2000). 

Some researxhers ( Kamio & Iba, 2005) were efficiently applied hybrid version of 
reinforcement learning structures, integrating genetic programming and Q-Learning 
method on real humanoid robot. 

4. Hybrid Reinforcement Learning Control Algorithms for Biped Walking 

The new integrated hybrid dynamic control structure for the humanoid robots will be 
proposed, using the model of robot mechanism. Our approach consists in departing from 
complete conventional control techniques by using hybrid control strategy based on model- 
based approach and learning by experience and creating the appropriate adaptive control 
systems. Hence, the first part of control algorithm represents some kind of computed torque 
control method as basic dynamic control method, while the second part of algorithm is 
reinforcement learning architecture for dynamic compensation of ZMP ( Zero-Moment- 
Point) error. 

In the synthesis of reinforcement learning strycture, two algorithms will be shown, that are 
very successful in solving biped walking problem: adaptive heuristic approach (AHC) 
approach, and approach based on Q learning, To solve reinforcement learning problem, the 
most popular approach is temporal difference (TD) method (Sutton & Barto, 1998). Two TD- 
based reinforcement learning approaches have been proposed: The adaptive heuristic critic 
(AHC) (Barto et al., 1983) and Q-learning (Watkins & Dayan, 1992). In AHC, there are two 
separate networks: An action network and an evaluation network. Based on the AHC, In 
(Berenji & Khedkar, 1992), a generalized approximate reasoning-based intelligent control 
(GARIC) is proposed, in which a two-layer feedforward neural network is used as an action 
evaluation network and a fuzzy inference network is used as an action selection network. 
The GARIC provides generalization ability in the input space and extends the AHC 
algorithm to include the prior control knowledge of human operators. One drawback of 
these actor-critic architectures is that they usually suffer from the local minimum problem in 
network learning due to the use of gradient descent learning method. 

Besides the aforementioned AHC algorithm-based learning architecture, more and more 
advances are being dedicated to learning schemes based on Q-learning. Q-learning collapses 
the two measures used by actor/ critic algorithms in AHC into one measure referred to as 
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the Q-value. It may be considered as a compact version of the AHC, and is simpler in 
implementation. Some Q-learning based reinforcement learning structures have also been 
proposed (Glorennec & Jouffe, 1997; Jouffe, 1998; Berenji, 1996).. In (Berenji & Jouffe, 1997), a 
dynamic fuzzy Q-learning is proposed for fuzzy inference system design. In this method, 
the consequent parts of fuzzy rules are randomly generated and the best rule set is selected 
based on its corresponding Q-value. The problem in this approach is that if the optimal 
solution is not present in the randomly generated set, then the performance may be poor. In 
(Jouffe, 1998), fuzzy Q-learning is applied to select the consequent action values of a fuzzy 
inference system. For these methods, the consequent value is selected from a predefined 
value set which is kept unchanged during learning, and if an improper value set is assigned, 
then the algorithm may fail. In (Berenji, 1996), a GARIC-Q method is proposed. This method 
works at two levels, the local and the top levels. At the local level, a society of agents (fuzzy 
networks) is created, with each learning and operating based on GARIC. While at the top 
level, fuzzy Q-learning is used to select the best agent at each particular time. In contrast to 
the aforementioned fuzzy Q-learning methods, in GARIC-Q, the consequent parts of each 
fuzzy network are tunable and are based on AHC algorithm. Since the learning is based on 
gradient descent algorithm, it may be slow and may suffer the local optimum problem. 

4.1 Model of the robot's mechanism 

The mechanism possesses 38 DOFs. Taking into account dynamic coupling between 
particular parts (branches) of the mechanism chain, a relation that describes the overall 
dynamic model of the locomotion mechanism in a vector form: 

P + J T (q)F = H(q)q + h(q,q) (1) 

where: P e R ml is the vector of driving torques at the humanoid robot joints; F e R" xX is the 
vector of external forces and moments acting at the particular points of the mechanism; 
H <eR" s " is the square matrix that describes 'full' inertia matrix of the mechanism; heR"' 
is the vector of gravitational, centrifugal and Coriolis moments acting at n mechanism 
joints; JeR" m is the corresponding Jacobian matrix of the system; « = 38, is the total 
number of DOFs; qeR 1 " 1 is the vector of internal coordinates; qeR nX[ is the vector of 
internal velocities. 

4.2 Definition of control criteria 

In the control synthesis for biped mechanism, it is necessary to satisfy certain natural 
principles. The control must to satisfy the following two most important criteria: (i) accuracy 
of tracking the desired trajectories of the mechanism joints (ii) maintenance of dynamic 
balance of the mechanism during the motion. Fulfillment of criterion (i) enables the 
realization of a desired mode of motion, walk repeatability and avoidance of potential 
obstacles. To satisfy criterion (ii) it means to have a dynamically balanced walk. 

4.3. Gait phases and indicator of dynamic balance 

The robot's bipedal gait consists of several phases that are periodically repeated. Hence, 
depending on whether the system is supported on one or both legs, two macro-phases can 
be distinguished: (i) single-support phase (SSP) and (ii) double-support phase (DSP). 
Double-support phase has two micro-phases: (i) weight acceptance phase (WAP) or heel 
strike, and (ii) weight support phase (WSP). Fig. 5 illustrates these gait phases, with the 
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projections of the contours of the right (RF) and left (LF) robot foot on the ground surface, 
whereby the shaded areas represent the zones of direct contact with the ground surface. 






□ 
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phase (WAP) phase (WSP) 



LF RF 



swing phase 
(SB) 



double support phase (DSP) single support 



single support 
phase (SSP) phase (SSP) 

Fig. 5. Phases of biped gait. 

The indicator of the degree of dynamic balance is the ZMP, i.e. its relative position with 
respect to the footprint of the supporting foot of the locomotion mechanism. The ZMP is 
defined (Vuobratovic & Juricic, 1969) as the specific point under the robotic mechanism foot 
. at which the effect of all the forces acting on the mechanism chain can be replaced by a 
unique force and all the rotation moments about the x and y axes are equal zero. Figs 6a and 
6b show details related to the determination of ZMP position and its motion in a 
dynamically balanced gait. The ZMP position is calculated based on measuring reaction 
forces F.i = \ .. 4 under the robot foot. Force sensors are usually placed on the foot sole in 

the arrangement shown in Fig. 6 a. Sensors' positions are defined by the geometric 
quantities //and / . If the point o is assumed as the nominal ZMP position (Fig. 6a), 

then one can use the following equations to determine the relative ZMP position with 
respect to its nominal: 

AM^ =±[{F 2+ F 4 )-(F 2 ° + F:)]-±[{F l+ F 3 )-(F: + F 3 )], 
AM^ =1 2 [{F 3 +F 4 )-(F 3 ° +F:)]-l 1 [{F i +F 2 )-(F i ° + F 2 )], 



F 



(z) 



■■T.".. 



Ax 



( imp ) 



-am;,"*'' 



Ay 



(zmp) _ AM^> 



where F and F°.i = l,..,4, are the measured and nominal values of the ground reaction 
force; AM (zmp) and AM ( * mp) are deviations of the moments of ground reaction forces around 
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the axes passed through the n ; F {z) is the resultant force of ground reaction in the vertical 

1 ° zmp r ^ J 

z-direction, while Ax l "" p) and Ay (:mp) are the displacements of ZMP position from its 

nominal o • The deviations Ax izmp> and Ay (zmp) of the ZMP position from its nominal 

position in x- and y-direction are calculated from the previous relation . The instantaneous 
position of ZMP is the best indicator of dynamic balance of the robot mechanism. In Fig. 6b 
are illustrated certain areas ( Z Z and Z )/ the so-called safe zones of dynamic balance of 

the locomotion mechanism. The ZMP position inside these "safety areas" ensures a 
dynamically balanced gait of the mechanism whereas its position outside these zones 
indicates the state of loosing the balance of the overall mechanism, and the possibility of its 
overturning. The quality of robot balance control can be measured by the success of keeping 
the ZMP trajectory within the mechanism support polygon (Fig. 6b). 

1 1 
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Wb 



^ <'<£•* + » s'&itv.clioiy-bf motion 



/.. 



M 



.^.._g£±^d|._ 



-+*- 



,D 



Fig. 6. Zero-Moment Point: a) Legs of "Toyota " humanoid robot ; General arrangement of 
force sensors in determining the ZMP position; b) Zones of possible positions of ZMP when 
the robot is in the state of dynamic balance. 



4.4 Hybrid intelligent control algorithm with AHC reinforcement structure 

Biped locomotion mechanism represents a nonlinear multivariable system with several 
inputs and several outputs. Having in mind the control criteria, it is necessary to control the 
following variables: positions and velocities of the robot joints and ZMP position . In 
accordance with the control task, we propose the application of the hybrid intelligent 
control algorithm based on the dynamic model of humanoid system . Here we assume the 
following: (i) the model (1) describes sufficiently well the behavior of the system; (ii) desired 
(nominal) trajectory of the mechanism performing a dynamically balanced gait is known. 
(iii) geometric and dynamic parameters of the mechanism and driving units are known and 
constant. These assumptions can be taken as conditionally valid, the rationale being as 
follows: As the system elements are rigid bodies of unchangeable geometrical shape, the 
parameters of the mechanism can be determined with a satisfactory accuracy. 
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Based on the above assumptions, in Fig. 7 a block-diagram of the intelligent controller for 
biped locomotion mechanism is proposed. It involves two feedback loops: (i) basic dynamic 
controller for trajectory tracking, (ii) intelligent reaction feedback at the ZMP based on AHC 
reinforcement learning structure. The synthesized dynamic controller was designed on the 
basis of the centralized model. The vector of driving moments P represents the sum of the 
driving moments P,P ,. The torques A are determined so to ensure precise tracking of the 

robot's position and velocity in the space of joints coordinates. The driving torques A are 
calculated with the aim of correcting the current ZMP position with respect to its nominal. 
The vector A of driving torques represents the output control vector. 




Fig. 7 Hybrid Controller based on Actor-Critic Method for trajectory tracking. 



4.5 Basic Dynamic Controller 

The proposed dynamic control law ha the following form: 

P = H(q)[q + K v (q-q ) + K p (q-q )] + h(q,g) (2) 

where H^h and J are the corresponding estimated values of the inertia matrix, vector of 
gravitational, centrifugal and Coriolis forces and moments and Jacobian matrix from the 
model (1). The matrices k eR" x " ar, d K e.R" x " are the corresponding matrices of position 

and velocity gains of the controller. The gain matrices K an d K can be chosen in the 

diagonal form by which the system is decoupled into n independent subsystems. This 
control model is based on centralized dynamic model of biped mechanism. 
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4.6 Compensator of dynamic reactions based on reinforcement learning structure 

In the sense of mechanics, locomotion mechanism represents an inverted multi link 
pendulum. In the presence of elasticity in the system and external environment factors, the 
mechanism's motion causes dynamic reactions at the robot supporting foot. Thus, the state 
of dynamic balance of the locomotion mechanism changes accordingly. For this reason it is 
essential to introduce dynamic reaction feedback at ZMP in the control synthesis. There are 
relationship between the deviations of ZMP positions ( Ax (:mp) , Ay (zmp) ) from its nominal 
position o in the motion directions x and y and the corresponding dynamic reactions 
M {2mp) and M {2mp) acting about the mutually orthogonal axes that pass through the point 

X y 

®zm P - M l:mp> ei? lxl and M {zmp) eR ul represent the moments that tend to overturn the robotic 

mechanism. M (zmp) e R 2 ' 1 and M (zmp) e R 2 " 1 are the vectors of nominal and measured values 

of the moments of dynamic reaction around the axes that pass through the ZMP (Fig. 6a). 
Nominal values of dynamic reactions, for the nominal robot trajectory, are determined off- 
line from the mechanism model and the relation for calculation of ZMP; AM l:mp) e R 2 '' is the 

vector of deviation of the actual dynamic reactions from their nominal values; p g R 2 * 1 is 

the vector of control torques, ensuring the state of dynamic balance. 

The control torques p has to be displaced to the some joints of the mechanism chain. Since 

the vector of deviation of dynamic reactions AM <:mp> has two components about the 

mutually orthogonal axes x and y, at least two different active joints have to be used to 
compensate for these dynamic reactions. Considering the model of locomotion mechanism, 
the compensation was carried out using the following mechanism joints: 9, 14, 18, 21 and 25 
to compensate for the dynamic reactions about the x-axis, and 7, 13, 17, 20 and 24 to 
compensate for the moments about the y-axis. Thus, the joints of ankle, hip and waist were 
taken into consideration. Finally, the vector of compensation torques P was calculated on 
the basis of the vector of the moments p in the case when compensation of ground 
dynamic reactions is performed using all six proposed joints, using the following relation 
p 2 (9) = p 2 (14) = p 2 (l 8) = p 2 (21) = p 2 (25) = M5P dr (3) 

p 2 (7) = p 2 (l 3) = p 2 (l 7) = p 2 (20) = p 2 (24) = M5P dr (4) 

In nature, biological systems use simultaneously a large number of joints for correcting their 
balance. In this work, for the purpose of verifying the control algorithm, the choice was 
restricted to the mentioned ten joints: 7, 9, 13, 14, 17, 18, 20, 21, 24, and 25. Compensation of 
ground dynamic reactions is always carried out at the supporting leg when the locomotion 
mechanism is in the swing phase, whereas in the double-support phase it is necessary to 
engage the corresponding pairs of joints (ankle, hip, waist) of both legs. 

On the basis of the above the fuzzy reinforcement control algorithm is defined with respect 
to the dynamic reaction of the support at ZMP. 

4.7. Reinforcement Actor-Critic Learning Structure 

This subsection describes the learning architecture that was developed to enable biped 
walking. A powerful learning architecture should be able to take advantage of any available 
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knowledge. The proposed reinforcement learning structure is based on Actor-Critic 
Methods (Sutton & Barto, 1998). 

Actor-Critic methods are temporal difference (TD) methods, that have a separate memory 
structure to explicitly represent the control policy independent of the value function. In this 
case, control policy represents policy structure known as Actor with aim to select the best 
control actions. Exactly, the control policy in this case, represents the set of control 
algorithms with different control parameters. The input to control policy is state of the 
system, while the output is control action (signal). It searches the action space using a 
Stochastic Real Valued (SRV) unit at the output. The unit's action uses a Gaussian random 
number generator. The estimated value function represents a Critic, because it criticizes the 
control actions made by the actor. Typically, the critic is a state-value function which takes 
the form of TD error necessary for learning. TD error depends also from reward signal, 
obtained from environment as result of control action. The TD Error can be scalar or fuzzy 
signal that drives all learning in both actor and critic. 

Practically, in proposed humanoid robot control design, it is synthesized the new modified 
version of GARIC reinforcement learning structure (Berenji & Khedkar, 1992) . The 
reinforcement control algorithm is defined with respect to the dynamic reaction of the 
support at ZMP, not with respect to the state of the system. In this case external 
reinforcement signal (reward) R is defined according to values of ZMP error. 
Proposed learning structure consists from two networks: AEN(Action Evaluation Network) 
- CRITIC and ASN(Action Selection Network) - ACTOR. AEN network maps position and 
velocity tracking errors and external reinforcement signal R in scalar or fuzzy value which 
represent the quality of given control task. The output scalar value of AEN is important for 
calculation of internal reinforcement signal. R AEN constantly estimate internal 
reinforcement based on tracking errors and value of reward. AEN is standard 2-layer 
feedforward neural network (perceptron) with one hidden layer. The activation function in 
hidden layer is sigmoid, while in the output layer there are only one neuron with linear 
function. The input layer has a bias neuron. The output scalar value V is calculated based 
on product of set C of weighting factors and values of neurons in hidden later plus product 
of set A of weighting factors and input values and bias member. There are also one more set 
of weighting factors B between input layer and hidden layer. The number of neurons on 
hidden later is determined as 5. Exactly, the output V can be represented by the following 
equation: 

v = £ B.AMl zmp) + E Cjffc A£Mf mp) ) (5) 

< ji 

where J is sigmoid function. 

The most important function of AEN is evaluation of TD error, exactly internal 
reinforcement. The internal reinforcement is defined as TD(0) error defined by the following 
equation: 

R(t + 1) = 0, begining state (6) 

R(t + l) = R(t)-v(t), failure state (7) 

R(t + \) = R(t) + yv(t + \)-v(t), otherwise (8) 
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where y is a discount coefficient between and 1 (in this case y is set to 0.9). 

ASN (action selection network) maps the deviation of dynamic reactions AM lzmp} &R 2xl in 
recommended control torque. The structure of ASN is represented by The ANFIS - Sugeno- 
type adaptive neural fuzzy inference systems. There are five layers: input layer, antecedent 
part with fuzzification, rule layer, consequent layer, output layer wit defuzzification. This 
system is based on fuzzy rule base generated by expert knoOwledge with 25 rules. The 
partition of input variables (deviation of dynamic reactions) are defined by 5 linguistic 
variables: NEGATIVE BIG, NEGATIVE SMALL, ZERO, POSITIVE SMALL and POSITIVE 
BIG. The member functions is chosen as triangular forms. 

SAM (Stochastic action modifier) uses the recommended control torque from ASN and 
internal reinforcement signal to produce final commanded control torque p . It is defined 

by Gaussian random function where recommended control torque is mean, while standard 
deviation is defined by following equation: 

<r(£(f+l)) = l-exp(-|£(f + l)|) (9) 

Once the system has learned an optimal policy, the standard deviation of the Gaussian 
converges toward zero, thus eliminating the randomness of the output. 

The learning process for AEN (tuning of three set of weighting factors A,B,C) is 
accomplished by step changes calculated by products of internal reinforcement, learning 
constant and appropriate input values from previous layers, i.e. according to following 
equations: 

B t (t + 1) = B. (0 + j3R(t + l)AM^ mp) (t) ( 1 0) 

C.(t + l) = C j (t) + /3R(t + l)f(£A iJ (t)AM^\t)) (11) 

4>+i) = 4(0+ 

BR(t + 1)/(X 4 (0 AM/ 2 "*' )(0(1 - fCZ 4 (t)AM^ msgniCAM^ (/)) (12) 
j> A 

where p is learning constant. The learning process for ASN (tuning of antedecent and 

consequent layers of ANFIS) is accomplished by gradient step changes (back propagation 
algorithms) defined by scalar output values of AEN, internal reinforcement signal, learning 
constants and current recommended control torques. 

In our research, the precondition part of ANFIS is online constructed by special 
clustering approach. The general grid type partition algorithms perform either with 
training data collected in advance or cluster number assigned a priori. In the 
reinforcement learning problems, the data are generated only when online learning is 
performed. For this reason, a new clustering algorithm based on Euclidean Distance 
measure, with the abilities of online learning and automatic generation of number of 
rules is used. 

4.8 Hybrid intelligent control algorithm with Q reinforcement structure 

From the perspective of ANFIS Q-learning, we propose a method, as combination of 
automatic precondition part construction and automatic determination of the consequent 
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parts of a ANFIS system. In application, this method enables us to deal with continuous 
state and action spaces. It helps to solve the curse of dimensionality encountered in high- 
dimensional continuous state space and provides smooth control actions. Q-learning is a 
widely-used reinforcement learning method for an agent to acquire optimal policy. In this 
learning, an agent tries an action, CJ\t), at a particular state, x(t), and then evaluates its 

consequences in terms of the immediate reward R\t) .To estimate the discounted 
cumulative reinforcement for taking actions from given states, an evaluation function, the 
Q-f unction, is used. The Q-f unction is a mapping from state-action pairs to predict return 

and its output for state X and action a is denoted by the Q-value, Q\X, a) . Based on this 
Q- value, at time t, the agent selects an action d(t) . The action is applied to the 
environment, causing a state transition from X\l) to x(t + 1) , and a reward R(t) is 

received. Then, the Q-function is learned through incremental dynamic programming. The Q- 
value of each state/action pair is updated by 

Q(x(t), a{t)) = Q(x(t), a(t)) + a(R(t) 

+ yQ'{x{t + \)) -Q{x(t),a(t))) 
Q'{x{t + \))= max Q(x(t+l),b) (13) 

bsA(x(t+V)) 

where A(x(t + 1)) is the set of possible actions in state ; a is the learning rate; Y is the 
discount rate. 

Based on the above facts, in Fig. 8 a block-diagram of the intelligent controller for biped 
locomotion mechanism is proposed. It involves two feedback loops: (i) basic dynamic 
controller for trajectory tracking, (ii) intelligent reaction feedback at the ZMP based on Q- 
reinforcement learning structure 
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Fig. 8. Hybrid Controller based on Q-Learning Method for trajectory tracking. 
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4.9. Reinforcement Q-Learning Structure 

The precondition part of the ANFIS system is constructed automatically by the clustering 
algorithm . Then, the consequent part of this newly generated rule is designed. In this 
methods, a population of candidate consequent parts is generated. Each individual in the 
population represents the consequent part of a fuzzy system. Since we want to solve 
reinforcement learning problems, a mechanism to evaluate the performance of each 
individual is required. To achieve this goal, each individual has a corresponding Q-value. 
The objective of the Q-value is to evaluate the action recommended by the individual. A 
higher Q-value means a higher reward that will be achieved. Based on the accompanying 
Q-value of each individual, at each time step, one of the individuals is selected. With the 
selected individual (consequent part), the fuzzy system evaluates an action and a 
corresponding system Q-value. This action is then applied to the humanoid robot as part 
of hybrid control algorithm with a reinforcement returned. Based on this reward, the Q- 
value of each individual is updated based on temporal difference algorithm. The 
parameters of consequent part of ANFIS is also updated based on back propagation 
algorithm and value of reinforcement The previous process is repeatedly executed until 
success. 
Each rule in the fuzzy system is presented in the following form: 

Rule:If ;q(/) isA tl And...x n (t)isA m Thena(t)isa 1 (t) (14) 

Where x(t) is the input value, a (t) is the output action value, A is a fuzzy set and a(t) is a 
recommended action is a fuzzy singleton. If we use a Gaussian membership function as 
fuzzy set , then for given an input data X = (X l , X 2 , • • • , X ) , the firing strength O . (x) of 
rule I is calculated by 

where m .. and a ,, denote the mean and width of the fuzzy set. 

Suppose a fuzzy system consists of L rules. By weighted average defuzzification method, 
the output of the system is calculated by 

L 

a=^- L (16) 

A population of recommended actions , involving individuals is created. Each individual in 
the population represents the consequent values, a,,..,a,oi a fuzzy system. The Q-value 

used to predict the performance of individual I is denoted as 1, . An individual with a 
higher Q-value means a higher discounted cumulative reinforcement will be obtained by 
this individual. At each time step, one over these N individuals is selected as the consequent 
part of a fuzzy system based on their corresponding Q-values. This fuzzy system with 
competing consequences may be written as 
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If (Precondition Part) Then (Consequence) is 

Individual 1 (a\ ,..., a L with q ; 
Individual 2 ( a\ , . . . , a\ with q 2 



Individual N ( a, , . . . , a L with q N 

.* 

To accomplish the selection task, we should find the individual I whose Q-value is the 
largest, i.e. We call this a greedy individual, and the corresponding actions for rules are 
called greedy actions. The greedy individual is selected with a large probability \-£ . 
Otherwise, the previously selected individual is adopted again. Suppose at time , the 

individual i is selected, , i.e., actions (2 l (t ),..., O l (t) are selected for rules 1, ..., L, 

respectively. Then, the final output action of the fuzzy system is 

2>,.(x(0)a;(0 
a(t) = — L (IV) 

I>,«0) 

1=1 
The Q-value of this final output action should be a weighted average of the Q-values 
corresponding to the actions d\ if) ,...,Cl' L (t) .i.e., 

i>,(x(0)<7,(0 
Q(x(t),a(t)) = ^~ L = q,(t) (18) 

£o,«o) 

From this equation, we see that the Q-value of the system output is simply equal to g f (t) , 

the Q-value of the selected individual I . This means q, that simultaneously reveals both 

the performance of the individual and the corresponding system output action. In contrast 
to traditional Q-learning, where the Q-values are usually stored in a look-up table, and can 
deal only with discrete state/ action pairs, here both the input state and the output action are 
continuous. This can avoid the impractical memory requirement for large state-action 
spaces. The aforementioned selecting, acting, and updating process is repeatedly executed 
until the end of a trial. 

Every time after the fuzzy system applies an action Cl(t) to the environment and a 
reinforcement R(t) , learning of the Q-values is performed. Then, we should update (J^t) 
based on the immediate reward R(t) and the estimated rewards from subsequent states. 
Based on the Q-learning rule, we can update g~ as 
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q t (t) = q } (t) + a(R(t) + yQ\x{t + 1)) - q,(t)) 
Q\x(t + 1)) = max Q(x(t + \),t) 

i=l,..,N 

= max q i (t) = q,(t) (19) 

i=l,..,N ' 

That is 

q x (0 = $ (0 + a(/?(0 + fg, (*(* + 1)) - q, (t)) = q, (t) + «Aq ; (t) (20) 

Where Aq t (t) is regarded as the temporal error. 

To speed up the learning, the eligibility trace is combined with Q-learning . The eligibility 

trace for individual i at time t is denoted as Q i \t) . On each time step, the eligibility traces 

for all individuals are decayed by A , and eligibility trace for the selected individual i on 
the current step increased by 1, that as 

e t (t) = Xe t (t +\), ifi^i 

= /le 1 (/ l -l) + l. ifi = i 

A is a trace-decay parameter. The value e, (t) can be regarded as an accumulating trace for 

each individual I since it accumulates whenever an individual is selected, then decays 
gradually when the individual is not selected. It indicates the degree to which each 
individual is eligible for undergoing learning changes. With eligibility trace, (20) is changed 
to 

q } (t) = q } (t) + aAq l (t)e i (t) (21) 

for all i=l,...,N. Upon receiving a reinforcement signal, the Q- values of all individuals are 
updated by (21). 

4.10 Fuzzy Reinforcement Signal 

The detailed and precise training data for learning is often hard to obtain or may not be 
available in the process of biped control synthesis. Furthermore, a more challenging aspect 
of this problem is that the only available feedback signal (a failure or success signal) is 
obtained only when a failure (or near failure) occurs, that is, the biped robot falls down (or 
almost falls down). Since no exact teaching information is available, this is a typical 
reinforcement learning problem and the failure signal serves as the reinforcement signal. 
For reinforcement learning problems, most of the existing learning methods for neural 
networks or fuzzy-neuro networks focus their attention on numerical evaluative 
information. But for human biped walking, we usually use linguistic critical signal, such as 
"near fall down", "almost success", "slower", "faster" and etc., to evaluate the walking gait. In 
this case, using fuzzy evaluation feedback is much closer to the learning environment in the 
real world. Therefore, there is a need to explore possibilities of the reinforcement learning 
with fuzzy evaluative feedback, as it was investigated in paper (Zhou & Meng, 2000). Fuzzy 
reinforcement learning generalizes reinforcement learning to fuzzy environment where only 
the fuzzy reward function is available. 
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The most important part of algorithm represent the choice of reward function - external 
reinforcement. It is possible to use scalar critic signal (Katie & Vukobratovic, 2007), but as 
one of solution, the reinforcement signal was considered as a fuzzy number R(t). We also 
assume that R(t) is the fuzzy signal available at time step t and caused by the input and 
action chosen at time step t-1 or even affected by earlier inputs and actions. For more 
effective learning, a error signal that gives more detail balancing information should be 
given, instead of a simple "go -no go" scalar feedback signal. As an example in this paper, 
the following fuzzy rules can be used to evaluate the biped balancing according to following 
table. 



Ax (zmp) 


SMALL 


MEDIUM 


HUGE 


Ay (zmp) 








SMALL 


EXCELLENT 


GOOD 


BAD 


MEDIUM 


GOOD 


GOOD 


BAD 


HUGE 


BAD 


BAD 


BAD 



Fuzzy rules for external reinforcement 

The linguistic variables for ZMP deviations Ax [ ~'" p) and Ay ( * mp) and for external 
reinforcement R are defined using membership functions that are defined in Fig.9. 





0.3 0.6 Lf*jf 0.5 

Fig. 9. The Membership functions for ZMP deviations and external reinforcement. 



5. Experimental and Simulation Studies 

With the aim of identifying a valid model of biped locomotion system of anthropomorphic 
structure, the corresponding experiments were carried out in a caption motion studio (Rodic 
et al., 2006). For this purpose, a middle-aged (43 years) male subject, 190 [cm] tall, weighing 
84.0728 [kg], of normal physical constitution and functionality, played the role of an 
experimental anthropomorphic system whose model was to be identified . The subject's 
geometrical parameters (the lengths of the links, the distances between the neighboring 
joints and the particular significant points on the body) were determined by direct 
measurements or photometricaly. The other kinematic parameters, as well as dynamic ones, 
were identified on the basis of the biometric tables, recommendations and empirical 
relations (Zatsiorsky et al., 1990). A summary of the geometric and dynamic parameters 
identified on the considered experimental bio-mechanical system is given in Tables 1 and 2. 
The selected subject, whose parameters were identified , performed a number of motion 
tests (walking, staircase climbing, jumping), whereby the measurements were made under 
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the appropriate laboratory conditions. Characteristic laboratory details are shown in Fig. 10. 
The VICON caption motion studio equipment was used with the corresponding software 
package for processing measurement data. To detect current positions of body links use was 
made of the special markers placed at the characteristic points of the body/limbs (Figs. 10a 
and 10b). Continual monitoring of the position markers during the motion was performed 
using six VICON high-accuracy infra-red cameras with the recording frequency of 200 [Hz] 
(Fig. 10c). Reactive forces of the foot impact/ contact with the ground were measured on the 
force platform (Fig. lOd) with a recording frequency of 1.0 [GHz]. To mimic a rigid foot- 
ground contact, a 5 [mm] thick wooden plate was added to each foot (Fig. 10b). 



Link 


Length 
[m] 


Mass 
[kg] 


CM Position 








Sagittal 


Longitudinal 


Head 


0.2722 


5.8347 


0.0000 


0.1361 


Trunk 


0.7667 


36.5380 


0.0144 


0.3216 


Thorax 


0.2500 


13.4180 


0.0100 


0.1167 


Abdomen 


0.3278 


13.7291 


0.0150 


0.2223 


Pelvis 


0.1889 


9.3909 


0.0200 


0.0345 


Arm 


0.3444 


2.2784 


0.0000 


-0.1988 


Forearm 


0.3222 


1.3620 


0.0000 


-0.1474 


Hand 


0.2111 


0.5128 


0.0000 


-0.0779 


Thigh 


0.5556 


11.9047 


0.0000 


-0.2275 


Shank 


0.4389 


3.6404 


0.0000 


-0.1957 


Foot 


0.2800 


1.1518 


0.0420 


-0.0684 



Table 1. The anthropometric data used in modeling of human body (kinematic parameters 
and mass of links). 



Link 


Radii of gyration [m] 


Moments of inertia 




Sagitt Trans. Longit. 


Ix Iy Iz 


Head 


0.0825 0.0856 0.0711 


0.0397 0.0428 0.0295 


Trunk 


0.2852 0.2660 0.1464 


2.9720 2.5859 0.7835 


Thorax 


0.1790 0.1135 0.1647 


0.4299 0.1729 0.3642 


Abdomen 


0.1580 0.1255 0.1534 


0.3427 0.2164 0.3231 


Pelvis 


0.1162 0.1041 0.1109 


0.1267 0.1017 0.1155 


Arm 


0.0982 0.0927 0.0544 


0.0220 0.0196 0.0067 


Forearm 


0.0889 0.0854 0.0390 


0.0108 0.0099 0.0021 


Hand 


0.1326 0.1083 0.0847 


0.0090 0.0060 0.0037 


Thigh 


0.1828 0.1828 0.0828 


0.3977 0.3977 0.0816 


Shank 


0.1119 0.1093 0.0452 


0.0456 0.0435 0.0074 


Foot 


0.0720 0.0686 0.0347 


0.0060 0.0054 0.0014 



Table 2. The anthropometric data used in modeling of human body (dynamic parameters 
inertia tensor and radii of gyration). 
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Fig. 10. Experimental capture motion studio in the Laboratory of Biomechanics (Univ. of La 
Reunion, CURAPS, Le Tampon, France): a) Measurements of human motion using 
fluorescent markers attached to human body; b) Wooden plates as feet-sole used in 
locomotion experiments; c) Vicon infra-red camera used to capture the human motion; d) 6- 
DOFs force sensing platform -sensor distribution at the back side of the plate. 

A moderately fast walk (v =1.25 [m/s]) was considered as a typical example of task which 
encompasses all the elements of the phenomenon of walking. Having in mind the 
experimental measurements on the biological system and, based on them further theoretical 
considerations, we assumed that it is possible to design a bipedal locomotion mechanism 
(humanoid robot) of a similar anthropomorphic structure and with defined (geometric and 
dynamic) parameters. In this sense, we have started from the assumption that the system 
parameters presented in Tables 1 and 2 were determined with relatively high accuracy and 
that they reflect faithfully characteristics of the considered system. Bearing in mind 
mechanical complexity of the structure of the human body, with its numerous DOFs, we 
adopted the corresponding kinematic structure (scheme) of the biped locomotion 
mechanism (Fig. 11) to be used in the simulation examples. We believe that the mechanism 
(humanoid) of the complexity shown in Fig. 11 would be capable of reproducing with a 
relatively high accuracy any anthropomorphic motion -rectilinear and curvilinear walk, 
running, climbing/ descending the staircase, jumping, etc. The adopted structure has three 
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active mechanical DOFs at each of the joints -the hip, waist, shoulders and neck; two at the 
ankle and wrist, and one at the knee, elbow and toe. The fact is that not all available 
mechanical DOFs are needed in different anthropomorphic movements. In the example 
considered in this work we defined the nominal motion of the joints of the legs and of the 
trunk. At the same time, the joints of the arms, neck and toes remained immobilized. On the 
basis of the measured values of positions (coordinates) of special markers in the course of 
motion (Figs. 10a, 10b) it was possible to identify angular trajectories of the particular joints 
of the bipedal locomotion system. These joints trajectories represent the nominal, i.e. the 
reference trajectories of the system i. The graphs of these identified/ reference trajectories are 
shown in Figs. 12 and 13. The nominal trajectories of the system's joints were differentiated 
with respect to time, with a sampling period of At = 0.001 [ms]. In this way, the 
corresponding vectors of angular joint velocities and angular joint accelerations of the 
system illustrated in Fig. 11 were determined. Animation of the biped gait of the considered 
locomotion system, for the given joint trajectories (Figs. 12 and 13), is presented in Fig. 14 
through several characteristic positions. The motion simulation shown in Fig. 14 was 
determined using kinematic model of the system. The biped starts from the state of rest and 
then makes four half-steps stepping with the right foot once on the platform for force 
measurement.Simulation of the kinematic and dynamic models was performed using 
Matlab/Simulink R13 and Robotics toolbox for Matlab/Simulink. Mechanism feet track 
their own trajectories (Figs. 12 and 13) by passing from the state of contact with the ground 
(having zero position) to free motion state. 

I lead 
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Fig. 11. Kinematic scheme of a 38-DOFs biped locomotion system used in simulation as the 
kinematic model of the human body referred to in the experiments. 
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Fig. 12. Nominal trajectories of the basic link: x-longitudinal, y-lateral, z-vertical, cp-roll, 6- 
pitch, i|r-yaw; Nominal waist joint angles: q7-roll, q8-yaw, q9-pitch. 

Some special simulation experiments were performed in order to validate the proposed 
reinforcement learning control approach. Initial (starting) conditions of the simulation 
examples (initial deviations of joints' angles) were imposed. Simulation results were 
analyzed on the time interval 0.1 [s]. In the simulation example, two control algorithms were 
analyzed: (i) basic dynamic controller described by computed torque method (without 
learning) and (ii) hybrid reinforcement learning control algorithm, (with learning). The 
results obtained by applying the controllers (i) (without learning) and (ii) (with learning) are 
shown on Figs. 15 and Fig.16. It is evident , that better results were achieved with using 
reinforcement learning control structure. 

The corresponding position and velocity tracking erros in the case of application 
reinforcement learning structure structure are presented on Figs. 17 and 18. The tracking 
errirs converge to zero values in the given time interval. It means that the controller ensures 
good tracking of the desired trajectory. Also, the application of reinforcement learning 
structure ensures a dynamic balance of the locomotion mechanism.. 

In Fig. 19 value of internal reinforcement through process of walking is presented. It is 
clear that task of walking within desired ZMP tracking error limits is achieved in a good 
fashion. 
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Fig. 13. Nominal joint angles of the right and left leg: ql3, ql7, q20, q24-roll, ql4, q21, ql6, 
ql8, q23, q25-pitch, ql5, q22-yaw. 
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Fig.14. Model-based animation of biped locomotion in several characteristic instants for the 
experimentally determined joint trajectories. 
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Fig. 15. Error of ZMP in x-direction (- with learning - . - without learning). 
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Fig. 16. Error of ZMP in y-direction (- with learning - . - without learning). 
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Fig. 17. Position tracking errors. 




Fig. 18. Velocity tracking errors. 



Reinforcement Learning Algorithms In Humanoid Robotics 



397 




Fig. 19. Reinforcement through process of walking. 



6. Conclusions 

This study considers a optimal solutions for application of reinforcement learning in 
humanoid robotics Humanoid Robotics is a very challenging domain for reinforcement 
learning, Reinforcement learning control algorithms represents general framework to take 
traditional robotics towards true autonomy and versatility. The reinforcement learning 
paradigm described above has been successfully implemented for some special type of 
humanoid robots in the last 10 years. Reinforcement learning is well suited to training biped 
walk in particular teaching a robot a new behavior from scalar or fuzzy feedback. The 
general goal in synthesis of reinforcement learning control algorithms is the development of 
methods which scale into the dimensionality of humanoid robots and can generate actions 
for biped with many degrees of freedom. In this study, control of walking of active and 
passive dynamic walkers by using of reinforcement learning was amalyzed. 
Various straightforward and hybrid intelligent control algorithms based RL for active and 
passive biped locomotion is presented. The proposed RL algorithms use the learning 
elements that consists of various types of neural networks, fuzzy logic nets or fuzzy-neuro 
networks with focus on fast convergence properties and small number of learning trials. 
Special part of study represents synthesis of hybrid intelligent controllers for biped walking. 
The hybrid aspect is connected with application of model-based and model free approaches 
as well as with combination of different paradigms of computational intelligence. These 
algorithms includes combination of a dynamic controller based on dynamic model and 
special compensators based on reinforcement structures. Two different reinforcement 
learning structures were proposed based on actor-critic approach and Q-learning. The 
algorithms is based on fuzzy evaluative feedback that are obtained from human intuitive 
balancing knowledge. The reinforcement learning with fuzzy evaluation feedback is much 
closer to the human biped walking evaluation than the original one with scalar feedback. 
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The proposed hybrid intelligent control scheme fulfills the preset control criteria. Its 
application ensures the desired precision of robot's motion, maintaining dynamic balance of 
the locomotion mechanism during a motion 

The developed hybrid intelligent dynamic controller can be potentially applied in 
combination with robotic vision, to control biped locomotion mechanisms in the course of 
fast walking, running, and even in the phases of jumping, as it possesses both the 
conventional position-velocity feedback and dynamic reaction feedback. Performance of the 
control system was analyzed in a number of simulation experiments in the presence of 
different types of external and internal perturbations acting on the system. In this paper, we 
only consider the flat terrain for biped walking. Because the real terrain is usually very 
complex, more studies need to be conducted on the proposed gait synthesis method for 
irregular and sloped terrain. 

Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to 
optimize performance in every possible configuration of the robot., uncertainties of the 
robot dynamics that must be only experimentally validated, and because coping with 
dynamic discontinuities caused by collisions with the ground and with the problem of 
delayed reward -torques applied at one time may have an effect on the performance many 
steps into the future. Hence, for a physical robot, it is essential to learn from few trials in 
order to have some time left for exploitation. It is thus necessary to speed the learning up by 
using different methods (hierarchical learning, subtask decomposition, imitation,...). 
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1. Introduction 

For a serious scientific interest or rather an amusing desire to be the creator like Pygmalion, 
human being has kept fascination to create something replicates ourselves as shown in 
lifelike statues and imaginative descriptions in fairy tales, long time from the ancient days. 
At the present day, eventually, they are coming out as humanoid robots and their brilliant 
futures are forecasted as follows. 1) Humanoid robot will take over boring recurrent jobs 
and dangerous tasks where some everyday tools and environments designed and optimised 
for human usage should be exploited without significant modifications. 2) Efforts of 
developing humanoid robot systems and components will lead some excellent inventions of 
engineering, product and service. 3) Humanoid robot will be a research tool by itself for 
simulation, implementation and examination of the human algorithm of motions, 
behaviours and cognitions with corporeality. 

At the same time, I cannot help having some doubts about the future of the humanoid robot 
as extension of present development style. Our biological constitution is evolved properly to 
be made of bio-materials and actuated by muscles, and present humanoid robots, on the 
contrary, are bounded to be designed within conventional mechanical and electric elements 
prepared for industrial use such as electric motors, devices, metal and plastic parts. Such 
elements are vastly different in characteristics from the biological ones and are low in some 
significant properties: power/weight ratio, minuteness, flexibility, robustness with self- 
repairing, energy and economic efficiency and so on. So the "eternal differences" in function 
and appearance will remain between the human and the humanoid robots. 
I guess there are chiefly two considerable ways for developing a preferable humanoid robot 
body. One is to promote in advance a development of artificial muscle that is exactly similar 
to the biological one. It may be obvious that an ideal humanoid robot body will be 
constructed by a scheme of putting a skeletal model core and attaching the perfect artificial 
muscles on it (Weghel et al., 2004, e.g.). Another is to establish some practical and realistic 
designing paradigms within extensional technology, in consideration of the limited 
performance of mechanical and electric elements. In this way, it will be the key point that 
making full use of flexible ideas of trimming and restructuring functions on a target. For 
example, that is to make up an alternative function by integrating some simple methods, 
when the target function is too complex to be a unitary package. Since it seems to take long 
time until the complete artificial muscles will come out, I regard the latter way is a good 
prospect to the near future rather than just a compromise. 
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In searching the appropriate designing paradigms for humanoid robots, it may be just the 
stage of digging and gathering many diverse and unique studies, styles and philosophies of 
the designing. After examining their practicability, reasonability and inevitability through 
the eyes of many persons over the long time, it will work out the optimised designing 
paradigms. I believe the most outstanding ingenuities in mechanical design are brought out 
in robot hands, as they are most complex mechanical systems on a humanoid robot body. 
Fortunately, I have had some experiences of designing them in each endoskeleton and 
exoskeleton styles. To contribute on such evolutive process as a mechanical engineer, I bring 
up here some of my ideas of designing robot hands. 

2. Humanoid Robot Hand in Endoskeleton Style 

2.1 Basic Design Conditions 

There are two opposite orientations in planning specifications of a robot hand. One is 
emphasizing similarity in motion and function to the human hand. Another is emphasizing 
similarity in size and weight. A robot hand following the former orientation tends to 
become rather large against the human hand. Although attempts to make it small and light 
have been done by putting the actuators away into a forearm segment and actuating the 
fingers via links or wires (Jacobsen, 1984, e.g.), the whole system is still large and heavy as 
a load at an end of arm. A robot hand following the latter orientation tends to lack functions 
as shown in general prosthetic hands. Most research robot hands, except some modern 
prosthetic ones (Harada hand, e.g.), are designed following the former orientation to enlarge 
their universality on study scenes. And it is rational estimation that the properties aimed in 
the latter orientation will be resolved spontaneously as the industrial progress leads general 
mechanical and electric elements high-performance. However, I have taken the latter 
orientation, because I believe it should be also constitutive approach to an ideal robot hand, 
moreover it is challenging for me to contrive novel designs of complex mechanisms. Then 
I have set the basic design conditions as shown on Table 1. 



1. Outline should be similar to the human hand with 5 fingers. 

2. Every motor should be embedded within the outline, and connections with outside 
controller and power source are limited to some slim and limber electric cables. 

3. Compactness and lightness should be equal to the average naked human hand. 
Concretely, mass is at most 500g, longitudinal length from fingertip to prospective 
center of wrist is at longest 200mm, and diameter of fingers is at most 20mm. 

4. As far as the third condition is satisfied, number of motors and range of finger 
movements should be enriched for realizing some practical functions like holding 
bulk object, pinching thin card, and the most sign language. As key finger motions, 
all fingers move widely from opening to clenching, and the thumb and each of 
other fingers contact opposing on both fingertips. 

5. As far as the fourth condition is satisfied, power of motors and robustness of 
mechanisms should be augmented. The desirable output force at each fingertip is 
over 5N under a straight posture. 



Table 1. The basic design conditions. 
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2.2 Kinematical Arrangement of Joints 

Fig. 1 shows the kinematical arrangement of joints in my robot hand. I identified each joint 
with two numbers and describe it as symbol J nm , in which the first suffix means finger 
number and the second one means joint number on each finger. The reason I avoid the 
popular joint nomenclature with the anatomical term like MP (metacarpal phalangeal joint) 
and DIP (distal interphalangeal joint) is that some of original human joints are complex one 
with multi degree of freedom (DOF) and hardly be substituted by a unitary joint mechanism. 




Fig. 1. The kinematical arrangement of joints. 

The joint arrangement on the four fingers except the thumb would be common to most 
humanoid robot hand. At the same time, difference of thumb structures between several 
competitive robot hands (Fig. 2) shows clearly the diverse design philosophies following 
individual purposes and available technologies. The every thumb structure on Fig. 2 has 
a complex joint that consists of a pair of revolute joints intersecting their axes at one point. 
The complex joint is an influential technology and the Shadow hand, above all, has two ones 
to realize the most similar action and appearance to those of human. On the other hand, 
I avoided such joint with estimating its complexity will require large space that violates the 
basic design conditions when it would be designed by my present technical capabilities. 
The order of each direction of joint axis and fingertip is also an influential characteristic for 
range and variety of the thumb motion. Especially in the NASA hand, to compensate 
disadvantages by low of DOF, the order is decided advisedly with considering a reasonable 
emulation of human thumb motion (Lovchik & Diftler, 1999). In my robot hand, the axis 
direction of joint J 1,3 is set for twisting the thumb tip that will conduce a stable pinching 
mentioned after. The axis directions of two joints Ji,i and Ji,2are decided mainly because of 
convenience to embed reducers with wide movable range. 
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Fig. 2. Thumb structures of competitive robot hands. 
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2.3 Compression of Independent DOF 

An "actuator assembly" that consists of a motor, an encoder and a reducer occupies 
considerable space in robot hand rigidly. Therefore, to plan a compact robot hand, it is 
a reasonable way to compress the number of equipped assemblies. This means furthermore 
compression of the independent DOF on the robot hand, so I pursued this idea boldly with 
keeping a reasonable emulation to the human hand motion. 

In the four fingers except the thumb, most robot hands are designed with the two distal 
joints J n s and J n ^ coupled-driven by one motor, since those joints in the human hand often 
move together. I furthered this idea of interlocking and made all three joints J n ,2, Jn,3 and J n ,4 
interlocked. In the same way, the relative opening-closing motion among the four fingers at 
the three root joints J2J, J4J and J5 1 is also interlocked. Moreover the joint J3 1 is eliminated 
since the human middle finger rarely moves at the joint. In the thumb, the two distal joints 
Ji,4 and J15 are interlocked and the joint J13 is fixed in a certain angle. 

Based on the consideration above, the essential independent DOF was extracted into 8. The 
member joints for each DOF are listed on Table 2(a) and named the "essential set". Although 
the essential set can realize almost all required finger motions like making a loop with the 
thumb and the little finger, it lacks sophisticated dexterity for some delicate handlings like 
pinching a thin card. At last, I have added some DOF on the essential set as listed on the 
Table 2(b) and it is named the "latest set". 

In the following two sections I explain mechanisms corresponding to the essential set, and in 
the later sections I introduce more complex mechanisms corresponding to the latest set. 
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(a) The essential set (8DOF) 
Table 2. The independent DOF set and member joints. 



(b) The latest set (15DOF) 
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2.4 Global Finger Flexion Mechanism 

As mentioned in the section 2.2 a complex joint in which two rotational axes intersect was 
avoided in the thumb mechanism, however, the other four fingers retain such complex joint 
at the root. The difficulties in developing such complex joint include how electric wirings 
are passed through the joint. The delicate wirings should not be overstretched due to joint 
rotation, so the wirings should be passed just or near the intersection point of two axes. 
I introduced a scheme that no part of an actuator assembly is placed around the intersection 
but enough empty space only for the wirings, at the same time the rotational power for the 
two axes is provided via wire or link mechanism from an off -site motors. 
Fig. 3 shows the internal structure of a basic finger, all the four fingers have the same 
mechanism. A small DC motor with a built-in encoder (Faulhaber, model 1516SR or 1319SR) 
is embedded in the largest finger segment lengthwise, like filling the segment volume. 
A reducer that consists of a crown gear train and two-stage planetary gear train is built in 
the joint J n ,3 and drives it. For optimum integration: a combination of larger and smaller 
gears is effective in uniting higher reduction ratio and efficiency, axes of all gear trains are 
arranged coaxial to that of joint J n ,3, so that volume of the joint part admits possibly largest 
diameter of gears without making the finger segment longer. In the actual reducer, the 
crown gear which diameter is almost equal to that of finger segment engages with the motor 
pinion and derives 10 times torque efficiently. In a general gear train with axes mutually 
orthogonal, the rigidity is low and the play is generated easily; so it is reasonable putting the 
crown gear train at the first stage viewed from the motor, where the transmission torque is 
low and the influence of play is negligible. Moreover the planetary gear train is constructed 
as a unique mixture of planetary-type and star-type for most compactness, so this reducer 
gets high reduction ratio 1/350 and the maximum joint torque is at least 0.5Nm. 



Sun gear 



Internal gear 

(common in two stages) 




Motor pinion 
\ Planetary gear 



Fig. 3. The internal structure of a basic finger. 



It is very important feature of a robot hand that it has smooth back-drivability at each joint 
for stable control of contacting force on a fingertip. I think it is more acceptable a slight play 
at gears as a consequence of high back-drivability than a large frictional resistance of gears 
as a consequence of elimination of the play, thanks to the general robot hand is demanded to 
perform relatively small force and low absolute accuracy in finger motions. So the actual 
fingers of my robot hand are manufactured with slight play to get most back-drivability; 
each finger can be moved passively by small force on the fingertip at most about 0.3N. 
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The interlocking mechanism among the three joints J n ,2, J n ,3and J n ^ consists of wire-pulley 
mechanisms with pulleys carved on the sidewall (Fig. 4(a)). Since they are thin enough the 
finger segments afford some internal space for motor, sensor and electric component. 
Although identical transmission can be constructed using a link mechanism, it tends to 
become larger due to restriction on facilitating smooth transmission in large rotational angle 
near 90degree. Considering a reasonable emulation of the human motion, the transmission 
ratios are set 7/10 from J n ,3 to J n ,2, and 5/7 from J n ,3 to J n ,4 respectively (same (b)). 



Figure-eight wire ropes 
connecting J n 2 and J n 3 , 
J„ i3 and ' 




78.4° 




(a) Wire-pulley mechanisms 
Fig. 4. Interlocking mechanism of the global finger flexion. 



(b) Maximum flexion range 



2.5 Abduction-Adduction Mechanism 

Since the relative opening-closing motion among the four fingers except the thumb is called 
abduction-adduction in the anatomical term, the interlocking mechanism among the three 
joints J21, J41 and J51 was named after it. Introduction of this mechanism has mainly two 
reasons as follows. The first is cutting back on the actuator assembly. The second is 
a practical idea that the ring and little fingers do not need such motion independently, 
because they on the human hand rarely take part in a dexterous pinching function. So it can 
be said that the target of this mechanism is only generating a relative opening-closing 
motion between the index finger and the middle finger. 

Rotational angle of those joints is small, so this mechanism is constructed as a link 
mechanism (Fig. 5), and in quest of efficiency and compactness it is designed as below. The 
joint J41 is selected as driven by a motor directly, so that the power transmission to other 
joints becomes the shortest route with high efficiency and low adverse influences like play. 
To obtain a large gear ratio with possibly least stages of gear train, a large sector gear that 
passes over the palm longitudinal length is adopted and fixed on the joint J4J (Fig. 6(a)), and 
the entire reduction ratio is implemented as 1/400. Thanks to the drive motor is located on 
the edge of the palm, over half of the space in the palm can be free for electric components. 
Even when each finger is bent to the maximum flexion as shown in the Fig. 4(b), the 
opening-closing motion can act independently thanks to the clearance between fingertip and 
palm; the robot hand can perform V-sign actions, e.g. freely. An electric printed circuit 
board (PCB) that contains a microcomputer for local processing, motor amplifiers, motor 
current sensors and an interface unit communicating with an outside controller through 
a serial signal line, is embedded like filling almost all the space in the palm (Fig. 6(b)). 
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Fig. 5. Connecting linkage for the abduction-adduction motion. 




Crown gear 



J 5,1 
i Connecting linkage 
\ between the finger II, IV, and V 

' Motor for the abduction-adduction motion 



(a) Location of the reduction gears and the linkage 
Fig. 6. Mechanism and electric equipment in the palm. 




(b) Embedded PCB 



2.6 Introduction of Independent Fingertip Motion 

Stably and softly pinching thin paper and handling small piece in complicated shape are the 
vital functions for a humanoid robot hand, when it could be recognised dexterous enough. 
To realize theses delicate functions, the key feature is a fine force and motion control on the 
fingertips. The human finger has excellent feature of controlling them finely in any direction, 
and furthermore it can generate large force in case of necessity. On the contrary, usual small 
actuator assembly that consists of motor and geared power train never has such large 
"dynamic range" in force control. In case of my robot finger, the main actuator assembly for 
the global finger flexion should be designed to generate possibly larger force for moving 
itself in sufficient speed against its mass and inertia, and moreover grasping a heavy object. 
Then it is hard for the actuator to be expected to generate fine force on the fingertip. 
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After considering above, I focused attention on the joint J n ,4 (J 1,5 in case of the thumb), as it 
could be free from generating the large force above; that means it could concentrate to 
generate a minute and fine force suitable for delicate pinching. Then I introduced a special 
independent DOF at the joint J n ,4 (Ji,s) so that it can control force and motion on the 
fingertip independently and finely. It is surely a little strange assignment of DOF while 
general robot hand has interlocking between the two distal joints J n ,3and J n 4 (J14 and Ji,s), 
however, I think that it maintains the basic design conditions as general harmony as 
a humanoid robot hand; the human finger also has latent ability to move its terminal joint 
independently. 

In addition, I propose a new strategy of the handling that combines a powerful grasping and 
a delicate pinching in consideration of that the small additional actuator at the joint J n ,4 (Ji,s) 
cannot sustain large contacting force at the fingertip. 

In power grasping (Fig. 7(a)), powerful grasping forces generated by the main actuator are 
transmitted to an object through the surface of finger segments except the fingertip. At the 
same time, the delicate force generated by the joint J n ,4 (Ji,s) lets the fingertip compliant to 
the object, and enlarge the effect of slip-free grasping. The movable range of the joint J„4 
(Ji,s) is broadened enough into the backside, so that the function of fingertip can work well 
in wide range of its posture. The shape of surface around the terminal joint is formed to be 
suitable for the direct contacting against the object. 

In delicate pinching (Fig. 7(b)), the contacting forces on the object are generated only by the 
joint J n ,4 (Ji,5). When larger pinching forces that the joint J n ,4 (Ji,s) cannot sustain are 
necessary, rotating the joint to the dead end of movable range where a mechanical stopper 
sustains the large force generated by the main actuator; strong pinching is executed. 
This strategy can be categorized as a general idea for a two-stage mechanism that compiles 
separated rough and fine control methods for enlarging the total dynamic range. 



Buik object 



Delicate tip forces 





(a) Power grasping (b) Delicate pinching 

Fig. 7. New strategy of object handling with independent fingertip motion. 
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2.7 Independent Terminal Joint Mechanism 

The mechanical feature necessary for the new handling strategy above is a smooth 
compliance of fingertip to an object, in concrete terms a following motion to keep a soft 
contacting force. Although a similar feature could be produced with adding a spring on the 
way of power transmission to the fingertip, the spring occupies not negligible space in the 
small finger segment. Another way introducing a force sensor on the fingertip is also hard in 
acquiring appropriate sensor and tuning up a force feedback controller robust. So the most 
practical way to realize the compliance is exploiting a simple open-loop torque control 
method of the motor only by observing its current. 

In this case, it requires a high efficient reducer with low frictional resistance and play of 
gears. I adopted a gear train that uses only high precision spur gears with expectation of 
their high efficiency. The motor length is restricted to the width of the finger segment, 
because the motor must be laid widthwise to be parallel to the joint axis. I searched on the 
web a possibly higher performance motor that fits into the limited length, and found a small 
DC motor (Sanyo Precision, model KS-2717) in production for small electric appliances. 
Since the output torque of the motor is very small 0.5Nmm in maximum, the reduction ratio 
is needed to be possibly higher. To realize high gear ratio avoiding deterioration of 
efficiency and inflation of volume, a two-stage thin gear train is built in against a sidewall 
with combining possibly largest and smallest gears for maximizing the gear ratio (Fig. 8); 
the gear module is 0.16 and the number of smallest gear's teeth is only nine. Consequently, 
the gear ratio is 3/250, and the maximum fingertip force is almost 2N. 

A thin potentiometer is introduced to measure the angular displacement and put against 
another sidewall. By this alignment of the reducer and the potentiometer on the both 
sidewalls, the finger segment retains enough empty space inside. 




Fig. 8. Actuator assembly for the independent terminal joint. 



To confirm the smooth compliance of a fingertip, a primary experiment was conducted. 
Concretely I examined active spring effect of the joint J n ,4 generated as follows. The joint J n ,4 
was always position-controlled to keep its angular displacement n ,4 at a certain target. The 
original target was set 0, that meant the terminal segment and middle segment were in line. 
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When the motor current of the joint J „4 exceeded a certain threshold, the target was shifted 
gradually to a temporary target by adding/subtracting a small variation proportional to the 
difference between the threshold and the present motor currents, so that the excess of motor 
current got reduced. When the motor current fell below the threshold and besides the 
temporary target was different from the original one, the temporary target was restored 
gradually to the original one by adding/ subtracting a small variation proportional to the 
difference between the latest temporary target and the original one. In consequence of 
repeating this control in high frequency, the active spring effect was created stably. 
In the experiment, the external pressure for examining the compliance was provided by the 
other finger part. Concretely, the finger part between the joints J n ,2 to J n ,4 was moved 
repeating a slight sine wave motion by a position-control of the main motor, and kept 
pressing the fingertip against a fixed object. Contacting force between the fingertip and the 
object was measured using a film-like force sensor (Nitta, FlexiForce) placed on the object; 
as a matter of course the observed force was not used in the motor control. 
Fig. 9 shows the result of experiment with transitions of the contacting force and the angular 
displacement # n ,4- The threshold of motor current was set in two values as examples that 
provide the limits of contacting force as about 0.9N and 1.4N respectively. Effectiveness of 
the smooth compliance of fingertip is confirmed by analysing each transition as follows. 
When the contacting force just exceeded the limit, the joint J n ,4 started to rotated to release 
the excess, so that the contacting force was stuck at the limit. The joint J n ,4 started to return 
to the original position when the contacting force fell below the limit. These movements are 
just the characteristics of desirable compliance. And stable constancy of the maximum 
contacting force at the limit reveals high efficiency of force transmission through the reducer, 
and prove adequacy of the estimating way of contacting force from the motor current. 
This smooth compliance cannot be produced by a simple spring mechanism because it 
cannot change the output force freely, then this practical function with minimum additional 
parts is one of the distinctive features on my robot hand. 
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Fig.9. Experimental result of the contacting force and the angular displacement of a fingertip. 
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2.8 Thumb Twisting Mechanism 

To fully maximize the capability of the smooth compliance generated by the independent 
terminal joint mechanism, the direction of fingertip force should be oriented rightly in the 
normal line of contacting surface on an object. This manner furthermore increases the 
stability of pinching in pressing the surface by entire flat pad on the fingertip. In order to 
realize this manner concretely, either or both of cooperating fingers in a pinching need to 
twist its fingertip along its longitudinal axis in a considerable angular displacement. Then 
I gave the thumb the distinctive joint J 1,3 for this twisting function. 

Fig. 10(a) shows 5 segments in the thumb; each segment is identified with the number 
starting from the root, and described as symbol Si, n . In the primitive robot hand with the 
essential 8DOF, the two segments Si,2 and S13 are integrated by fixing the twist angle. 
Avoiding deterioration of the original mechanical harmony, the actuator assembly for the 
joint J 1,3 was designed additionally. To minimize the joint J 1,3 volume, cylindrical case of the 
motor for the joint J-i^that is already embedded over segments Si,2 and S13 is exploited as 
the axis of Ji,3. Unlike the terminal joint mechanism, this twist function requires relatively 
large motor for moving large mass of the finger structure, so a bulge is added on the 
segment S12 to mount a motor same as that for the global finger flexion (Fig. 10(b)). 
Although this bulge is a little unbecoming, I accept it as a reasonable way for preventing 
stretch of the thumb length. The movable range of twisting is 45degree. 

By the way, the two joints Ji,i and Ji,2have the same actuator assembly for the global finger 
flexion shown in the Fig. 3, and each has adequate movable range of over 120degree and 
60degree respectively. 




(a) Segments in the thumb 
Fig. 10. Thumb mechanism with twisting function. 



Motor j, ,, 

(b) Actuator assembly 
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2.9 Independent Root Joint Mechanism 

In the four fingers except the thumb, since the both joints J n ^ and J n ,3 need no little power in 
the global finger flexion, the idea of interlocking these two joints and actuating them by one 
relatively large motor has adequate rationality, as far as the finger has no more capacity to 
accept two motors for actuating them independently. However, in some cases, the 
independent motion of each joint is required to realize some slight motion like adjusting the 
contacting place of a fingertip on an object. In order to demonstrate my technical capability 
to realize such complex requirement additionally, an actuator assembly was introduced at 
the joint J2,2 particularly. 

As a matter of course there is no capacity to accept a large motor, the additional motor is 
selected as the same small one driving the terminal joint. As the global finger flexion should 
be generated by the existing mechanism, the additional small actuator assembly should be 
designed to generate a differential motion as being overlapped on the global finger flexion. 
Well, the pulley on the joint J2,2 is existing as a basement of the global finger flexion and its 
shape is round and coaxial to the axis of joint J2,2, so it is convenient for realizing the 
differential motion by rotating the pulley around the axis. 

Fig. 11 shows the actuator assembly to rotate the pulley. To sustain the large torque around 
the joint J2,2 for the global finger flexion, it needs possibly larger reduction ratio. Therefore 
a worm gear train, that generally has large gear ratio, is introduced, so that the entire 
reduction ratio gets 1/1000. Although a worm gear train has no back-drivability, it is also an 
advantage in this case because that gear train can support any large torque in case of 
necessity. The movable range of the pulley is +15 to -15degree that makes useful adjusting 
motion at the fingertip in 10mm order. 
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(a) Worm gear mechanism to drive the pulley 



(b) Actual embedded situation 



Fig. 11. Differential mechanism for the independent root joint motion. 
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2.10 Smart Wiring for Bypassing Reducer 

The quality of a robot system is evaluated from many kinds of dimension including 
neatness of the electric wiring, since its weight and volume can bring recognizable 
deterioration in the performance of high-speed motion and indisputably deteriorate the 
appearance. The lack of space for containing the wiring is the most common cause of this 
problem because expelling the wiring outside makes its weight and volume to increase. In 
my robot hand, as mentioned in the section 2.4, the discussion about the designing root joint 
structure of each finger was started by consideration of this problem. And more problem is 
outstanding around the joint filled with the large reducer of ratio 1/350 meaning Jij, Ji,2, 
Ji,4, J2,3, J3,3, J4,3 and J53. Recognizing the significance of this problem, a unique and 
practical design of wiring is introduced. 

The role of the wiring is electric connection between the motor and sensor for the terminal 
joint and the main PCB in the palm, and a thin flexible PCB with 3.5mm width makes it. 
When the wiring is led as going around the reducer's circular outline, the change of shortest 
path length due to the finger flexion is remarkable, and then the method to retract and 
extract the corresponding length of wiring becomes the practical problem. My robot hand, 
fortunately, has enough margin space in the finger segments, and it can be formed an empty 
space where the wiring can adapt to the change of path length with changing the curving 
line by itself as shown in Fig. 12. 

By the way, this wiring style cannot be adopted on the two thumb root joints Jij and Ji,2 
because of lack of the internal space, and then the wirings through these joints are forced to 
go outside in a wide circle unbecomingly. This problem will be solved in the next 
development step waiting for an investment opportunity. 



fO ?fl 3fl 40 50 BO 




(a) Change of wiring path due to the finger flexion (b) Flexible PCB 

Fig. 12. Design of the wiring around the joint that contains the large reducer. 
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2.11 Overall view of the Humanoid Robot Hand 

As a conclusion of all previous considerations the latest model of my robot hand is built up 
as shown in Fig. 13; it has 15DOF as defined on the Table 2(b) while it satisfies the basic 
design conditions on the Table 1. The total mass including the internal electric equipment 
except the long cable connecting outside controllers is just 500g. The connections to outside 
systems are only 02.4 signal cable and 04.5 power cable. Some dimensions of details like 
the length of each finger segment are referred to my hand. 




Fig. 13 Overall profile of the latest model. 
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To confirm dexterity of the robot hand, some experiments of representative and practical 
handling motions were conducted; this paper displays two handling types: pinching a 
business card and holding a pen (Fig. 14). The key evaluation items in these experiments 
were the two distinctive functions: the smooth compliance on a fingertip and the twisting of 
the thumb. All the fingertip forces were generated by the simple open-loop torque control 
method explained in the section 2.7 without force sensors. 

By the way, the smart wiring style explained in the section 2.10 is installed only to the latest 
model, and the robot hand used in the experiments did not have it unfortunately. 




(a) Pinching a business card 
Fig. 14 The representative and practical handling motions. 



(b) Holding a pen 



In the experiment of pinching a business card, the robot hand performed switching several 
times two couples of pinching fingers: the thumb and the index finder/ the thumb and the 
middle finger (Fig. 15). In the junction phase when all the three fingers contacted on the card, 
the thumb slid its fingertip under the card from a position opposing a fingertip to another 
position opposing another fingertip. In the experiment of holding a pen, the robot hand 
moved the pen nib up and down and sled the thumb fingertip along the side of the pen (Fig. 
16). In both experiments, the objects: card and pen were held stably, and these achievements 
prove the contacting force appropriate in both strength and direction could be generated at 
each fingertip. 




Fig. 15 Cyclical steps in the experiment of pinching a business card. 
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Fig. 16 Cyclical steps in the experiment of holding a pen 



At the SIGGRAPH 2006, I got an opportunity to join into a participating party of the 
"Hoshino K. laboratory in the university of Tsukuba" which introduced my humanoid robot 
hand for the first time. The robot hand was demonstrated on a humanoid robot arm that is 
actuated by pneumatic power, and has 7DOF wide movable range, slender structure and 
dimensions like an endoskeleton of a human arm (Fig. 17). While its power is low and the 
practical payload at the wrist joint is about 1kg, it could move the robot hand smoothly. 
The conclusive advantage of the robot hand is that many complex functions are condensed 
in the humanlike size, weight and appearance, and realize the sophisticated dexterity. As 
the robot hand has rich suitability for delicate robot arms, after more sophistication, it will 
be developed to a good prosthetic hand in the near future. 




Fig. 17 Demonstration in the international exhibition SIGGRAPH 2006 in Boston. 
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3. Master Hand in Exoskeleton Style 

3.1 Introduction of Circuitous Joint 

As a dream-inspiring usage, the dexterous humanoid robot hand will be employed into 
a super "magic hand" with which an operator can manipulate objects freely from far away 
and get feedback of handling force and tactile sensations. Such intuitive master-slave control 
method of a humanoid robot with feedback of multi-modal perceptions is widely known as 
the Telexistence/ Telepresence, however, developments of adequate master controllers for 
them have been rare in comparison with slave humanoid robots. I guess one of major 
reasons is a difficult restriction in mechanical design that any mechanism cannot interfere 
operator's body. To solve this problem an idea of exoskeleton is brought up by association 
of a suit of armour that can follow wide movable range of human body with covering it. 
The most popular and practical master hand in exoskeleton style is the CyberGrasp, and 
most conventional master hands in exoskeleton style have the similar structure to it. They 
are designed to be lighter and slenderer with less material, so they have no core structure 
and cannot sustain their form as a hand without parasitism on operator's hand. This means 
they gives some constriction feeling to the operator and the slight force sensation in the 
feedback is masked. Then I have tried to design an ideal exoskeleton that fulfils every of 
lightness, slenderness and self-sustainability in its form. 

In designing such exoskeleton, the main theme is focused on joint mechanisms. The most 
practical joint is a revolute one that consists of an axis and bearings, and general ways to 
place it corresponding to an operator's joint are in parallel on backside or in coaxial beside. 
However, the former tends to deteriorate the movable range of operator's joint (Fig. 18(a)) 
and the latter cannot find an existing space between operator's fingers. Therefore I propose 
a novel joint mechanism named "circuitous joint" that has a virtual axis coincided with the 
axis of operator's skeleton while the all mechanism exists on backside of operator's finger. 
Technically this virtual axis is the instantaneous center of relative rotation of two segments. 
Fig. 18(b) shows the principle of the circuitous joint that realizes the virtual axis by 
stretching displacement s of two segments in proportion to the joint angular displacement 9. 
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(a) Ordinary revolute joint 
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(b) Circuitous joint 
Fig. 18 Behaviour of two types of revolute joint in following operator's finger. 
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3.2 Fundamental Mechanism of the Circuitous Joint 

In order to realize the principle of the circuitous joint mentioned above, rack and gearwheel 
mechanism was adopted in consideration of high rigidity of structure, certainty of motion, 
and facility of manufacturing. Fig. 19 shows the fundamental mechanism prepared for 
a principle study. A gearwheel is rotated on a rack by relative rotation of two segments, and 
shifting of its axis provides stretching of a segment that has the rack (Fig. 20). Since the two 
segments should make same stretching displacement together, two sets of the mechanism 
are combined in opposite direction. The gearwheel is formed to be sector gear by removing 
unnecessary part. We may note, in passing, this mechanism is an "over-constrained" 
mechanism, so it can keep its behaviour even without the actual axis. 



Segment A 
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Fig. 19 The fundamental mechanism as a unit of the circuitous joint. 
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Fig. 20 Mixed motion of rotating and stretching of two segments. 
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3.3 Kinematical Design of the Optimal Circuitous Joint 

To make the virtual axis coincide exactly to the axis of operator's skeleton, the relationship 
between the angular displacement and the stretching displacement s must be non-linear. 
This means the rectilinear rack and the circular gearwheels should not be adopted, however, 
they can get practical use with optimal specifications calculated as follows. 



Segment A 



(=r&) 




Fig. 21 Kinematical symbols in the circuitous joint. 

Fig. 21 shows definition of kinematical symbols of parts and parameters; for example, point 
V is the virtual axis. The specifications that provide the shape of rack and sector gear are 
only the pitch circle radius r of the sector gear and the standard offset p between the center- 
lines of the Segment A and the Bone A. Since the standard offset p is decided 10mm due to 
convenience of practical design of mechanism, only the radius r is an object to be optimised. 
The point V moves on the Y-axis by change of and its behaviour is divided into three types 
according to the size of r (Fig. 22). Considering its nearest trajectory to the point C, the 
preferable range of r is presumed as 0.5p < r < (2/ 7t)p. 
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Fig. 22 Motion of the virtual axis V on the Y-axis by change of 0. 

The evaluation item for the optimisation was set a deviation d defined by next formula that 
means deformation of kinematical relationship between two datum points A and B as 
shown in the Fig. 21, and the optimal radius r should minimise it. 



d = Ju +{p-v) {where u = -r0cos0 + psm0-r0, v = pcos0+r0sm0} 



(1) 



Fig. 23 shows curves of the deviation d vs. 0'm several settings of the radius r. The radius r is 
set within the presumed range. To generalise the optimisation each parameter is dealt as 
dimensionless number by dividing with the offset p. Screening many curves and seeking 
a curve which peak of d during a movable range of #is minimum among them, the optimal r 
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is found as the value that makes the sought curve. For example, when the movable range is 
< 0< n/T. the optimal radius r is 0.593p and the peak deviation d is 0.095p, and when the 
movable range is < ff< k/3 the optimal radius r is 0.537p and the peak deviation d is 0.029p. 
As the offset p is set 10mm, the peak of d is below acceptable 1mm; therefore, the mechanism 
with rectilinear rack and circular gearwheels has practicability enough. 



0.593 




Fig. 23 Variation of curves of the deviation d. 

3.4 Driving Method of the Circuitous Joint 

To design the joint mechanism light and slender, a method to drive it from away via a wire 
rope is introduced. The wire rope is set along two segments veering by a pulley on the 
sector gear's axis, and one end is fixed on a segment and another end is retracted/extracted 
by a winding drum set at a stationary root (Fig. 24(a)). Since the wire rope can generate only 
pulling force that rotates the joint in straightening direction, a spring is added to generate 
pushing force that rotates it in bending direction (same (b)). This driving method has further 
conveniences to be applied to a tandem connection model (same (c)). A wire rope to a distal 
joint from the root can be extended easily through other joints. Its tensile force shares 
accessorily a part of driving force of other joints they are nearer to the root and need 
stronger driving force. Moreover, a coupled-driving method of plural joints can be realized 
only by winding their wire ropes together with one drum. The rate of each rotation can be 
assigned separately by independent radii on the drum. 




(a) Path of the wire rope (b) Pushing spring 

Fig.24 Driving method of the circuitous joint. 



(c) Tandem connection 
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r p : Radius of the pulley (constant) 

k : Spring constant of the compression spring (constant) 

Fs : Spring force generated by the spring (intermediate variable) 

Fs' : Spring force generated by the spring when 0=0 (constant) 

W : Retracting/ extracting displacement of the wire rope (input variable) 

F : Pulling force of the wire rope (input variable) 

6 : Joint angular displacement (output variable) 

T : Joint torque (output variable) 

Fig. 25 Statical symbols in the circuitous joint. 

The definition of statical symbols is shown in Fig. 25, and the formulas for inverse statics 
calculating the input (manipulated) variables: w and F, from the output (controlled) 
variables: #and rare derived as follows. 



F = - 



w = (2r + r P )t 
2k -r 2 



2Fs' 



2r + r P 2(2r + r P ) 2r + r P 



(2) 
(3) 



As these formulas show simple and linear relationship between the input and output 
valuables, this driving method promises further advantage that the algorithm of controlling 
both position and force is fairly simple. When the spring effect is negligible, as the second 
and third terms on the right side of formula (3) are eliminated, we would be able to control 
the output torque rby using only the motor torque as the controlled variable. 



3.5 Master Finger Mechanism (MAF) 

Fig. 26 shows the practical master finger mechanism (MAF hereafter) corresponding to 
a middle finger of my hand and my humanoid robot hand, and proves the mechanism can 
follow them in wide movable range from opening to clenching. MAF is constructed with 
three discrete joint units, so that they are connected adapting to various pitch of operator's 
finger joints (Fig. 27). To make MAF narrow and short enough, each unit is designed 
possibly thin and aligned with partly overlapping. In this instance, all joints are coupled- 
driven by one relatively large motor (Faulhaber, model 1724SR). 

As shown in Fig. 28, the actual rack is placed in opposite side viewed from the axis in 
comparison with the previous illustrations. The reason is to dissolve the interference 
between the mechanism and operator's finger that has came up in the previous 
arrangements. Inverse gear is added to correct the stretching direction of each segment and 
carried on a slider to keep the position at midpoint of the rack and the sector gear. 
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Fig. 26 Master finger mechanism (MAF) following various finger flexions. 
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Fig. 27 Adjustable tandem connection of three joint units 
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Fig. 28 Internal mechanism of the joint unit. 
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3.6 Master-Slave Control with Encounter-Type Force Feedback 

As an ideal scheme of force display to the operator, the "encounter- type" has been proposed 
(McNeely, 1993, Tachi et al., 1994); that means a small object held up by a robot arm is 
approached and pressed to a part of operator's body where tactile sensation is necessary as 
occasion demands. Its chief advantages are making the operator to discriminate clearly the 
two phases "non-contact" and "contact", and free from constriction feeling during the non- 
contact phase. As it is suitable for the feature of my desired master hand, MAF introduced a 
function of non-contact following to the operator's finger. 

Since the present MAF has only 1DOF, the target motion of operator's finger is reduced to 
same 1DOF, and a gap between both fingertips of MAF and the operator is set as the 
controlled variable during the non-contact phase. Concretely, a sensor at fingertip of MAF 
measures the gap, and MAF is position-controlled to keep the gap at the desired value 2mm. 
Fig. 29 shows the fingertip assembly that contains a micro optical displacement sensor 
(Sanyo Electric, SPI-315-34), technically that detects motion of a swing reflector moved by 
the operator's nail in slight force, and the gap is presumed from the motion. 
During the contact phase, on the other hand, MAF should generate a desired contacting 
force against the operator's fingertips at the contact tip of the fingertip assembly. So a film- 
like force sensor (Nitta, FlexiForce) on the contact tip measures the contacting force, and 
MAF is force-controlled by changing the motor torque of winding the rope in proportion to 
the difference between the measured and desired contacting forces. 

An experimental master-slave system between MAF and a slave humanoid robot finger 
(SLF hereafter) was constructed as follows. SLF is always position-controlled to realize the 
same motion of MAF. The two phases of contact/ on-contact on controlling MAF are 
switched according to detecting existence/non-existence of the contacting force on SLF. 
A film-like force sensor on the surface of SLF's fingertip measures the contacting force, and 
the desired contacting force that MAF should generate is given as equal to that of SLF. 




Fig. 29 Fingertip assembly for the master finger mechanism (MAF). 

In order to confirm practicability of the master-slave system, an experiment was conducted. 
Fig. 30 shows the coupled motion of MAF and SLF in the non-contact phase; MAF was 
following the operator's finger with keeping a small gap at the fingertips. MAF and SLF 
could follow the operator's finger exactly as high as a less drastic speed. Since MAF had 
only 1DOF, SLF was prepared as the 1DOF mechanism interlocked all three joints. Moreover, 
the operator should also make his/her finger motion interlocking the three joints roughly 
similar to the behaviour of MAF. Though, I could forget an uncomfortable feeling by the 
fixed behaviour after familiarization, and enjoyed this experience. 
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Fig. 30 Circumstance of the experimental master-slave control. 
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Fig. 31 Experimental result of transferring the contacting force. 

Fig. 31 shows an experimental result; 01+02+03 means the sum of three joint angular 
displacements on MAF. The two vital features are shown: prompt switching of contact/non- 
contact phases, and transferring the contacting force from SLF to MAF. The contacting force 
at the fingertip of SLF was given by an assistant pushing on it; for example, the two contact 
phases at the time 7s and 10s were caused by assistant's tapping. While the algorithm 
switching the phases was a primitive bang-bang control, an oscillation iterating contact/ 
non-contact did not occur. I guess the season: since the gap between the fingertips is kept 
small during the non-contact phase, the impact at the encounter that will lead the oscillation 
is not so serious, moreover the human fingertip has effective damping to absorb it. 
As shown by the curves after the time was 13s, the operator's finger could be moved by the 
assistant's force; the master-slave (bilateral) control with force feedback was verified. 
In conclusion of this experiment, MSF has enough performance as a principal part of the 
master hand for the Telexistence/ Telepresence. 
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3.7 Overall view of the Master Hand 

Since it comes to the end of width of this paper, I describe briefly the overall view of the 
master hand. By the way, the nomenclature of each joint is same as shown in the Fig. 1. 
I gave four fingers to the master hand (Fig. 32); the little finger was omitted due to its little 
worth in general activities. The three finger mechanisms are same as shown in the Fig. 26, 
and the second and fourth finger have the abduction-adduction motion with active joints at 
J2.1 and J41. The each joint is position-controlled to follow lateral motion of the operator's 
finger detected at fingertip with similar sensor mechanism as shown in the Fig. 29; however, 
the additional sensor put beside the fingertip is omitted in the Fig. 32. 

In the thumb mechanism, the distal three segments are constructed with two circuitous 
joints at Ji,4 and J1.5. At the same time, elated ingenuity is exercised to design the joint 
mechanism corresponding to the carpo-metacarpal (CM) joint of operator; to make the two 
joint axes Ji,i and J1.2 intersected in an empty space for containing the CM joint, a slider 
mechanism is introduced where a motor-driven carriage runs on a sector rail in a wide circle. 
While the two joint axes Ji,3 and Ji.4for the MP joint are not intersected, the order of each 
direction of joint axis and fingertip is identical to that of the Shadow hand (Fig. 2). 
In the non-contact phase, the thumb mechanism is position-controlled to follow the 
operator's thumb opposing on both fingertips; each independent DOF has individual sensor 
similar to the previous one. As the mechanism does not touch the operator's thumb, slight 
deviation of the controlling is negligible. In the contact phase, only the joints Ji,4 and J-i.sare 
switched its control mode to the force-control. More sophisticated control algorithm for this 
thumb mechanism is under study in the "Tachi S. laboratory of the university of Tokyo" 
where I started developing this master hand as a researcher in 2001. 




Fig. 32 Whole picture of the master hand. 
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4. Conclusion 

To contribute on the evaluative process of searching the appropriate designing paradigms as 
a mechanical engineer, I bring up in this paper some of my ideas about the robot hand 
design concretely. While the designs of my robot hands may seem to be filled with eccentric, 
vagarious and serendipitous ideas for some people, I believe they are practical outcomes of 
flexible ingenuity in mechanical designing, so that they can take on pre-programmed but 
robust actuating roles for helping the programmable but limited actuators, and realize 
higher total balance in mechatronics. At the same time, for examining their practicability, 
reasonability and inevitability through the eyes of many persons, it will need to establish 
a standard definition and evaluation items in kinematics, dynamics, control algorithms and 
so on, that can subsume almost all humanoid robots. Concretely, a standard formats would 
be prepared to sort and identify any robot system by filling it. The Fig. 1 and 2 show my 
small trial of comprehensive comparison under a standard definition in the robot hand 
kinematics. And I hope the worldwide collaboration, so that it will promote developments 
of many sophisticated mechanical and electric elements that are easy to be used by many 
engineers like me who want any help to concentrate on his/her special fields. 
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1. Introduction 

Recently, robotics research has focused on issues surrounding the interaction modalities 
with robots, how these robots should look like and how their behavior should adapt while 
interacting with humans. It is believed that in the near future robots will be more prevalent 
around us. Thus it is important to understand accurately our reactions and dispositions 
toward robots in different circumstances (Nomura et al., 2006). Moreover, the robot's correct 
production and perception of social cues is also important. Humans have developed 
advanced skills in interpreting the intentions and the bodily expressions of other human 
beings. If similar skills can be acquired by robots, it would allow them to generate behaviors 
that are familiar to us and thus increase their chances of being accepted as partners in our 
daily lives. 

The expressiveness of a gesture is of great importance during an interaction process. We are 
often required to give special attention to these signs in order to keep track of the interaction. 
Humans have learned to adapt their behavior and to react to positive and negative bodily 
expressions (Bartenieff & Lewis, 1980). Although there has been remarkable work on the 
design issues of sociable robots (Breazeal, 2002) and affective autonomous machines 
(Norman et al., 2003), there has not been much work on investigating the real impact of 
robot bodily expressions on the human user in the context of human-robot interaction. 
Knowing the effect of a generated gesture, a robot can select more accurately the most 
appropriate action to take in a given situation. Besides, computer-animated characters have 
been used to evaluate human perception of the significance of gestures. However, animated 
characters and embodied ones should be treated differently since the latter are tangible 
entities (Shinozawa et al., 2005). 

In this article we report a study on the relation between bodily expressions and their 
impacts on the observer. We also attempt to understand the effect that expressions have on 
the observer's brain activity. Its sensitivity to bodily expressions can be used during an 
interaction task since the brain is the source of every cognitive and emotional effort. 



1 Corresponding author: Abdelaziz Khiat, Robotics laboratory, Graduate School of Information Science, Nara Institute of 
Science and Technology, Keihanna Science City, 630-0192 JAPAN. Email: khiat@ieee.org 
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Fig. 1. Considered scenario for robot bodily expressions and its perceived impression. 

In this work, we have conducted an experimental study where several users were asked to 
observe different robot bodily expressions while their brain activity was recorded. The 
results suggest the existence of a relation between the type of bodily expressions and the 
change in the level of low-alpha channel of brain activity. This result helped in the selection 
of features that were used to recognize the type of bodily expression an observer is watching 
at a certain time. The recognition rate was of about 80% for both cases of robot bodily 
expressions and of human bodily expressions. Potential applications include customized 
interface adaptation to the user, interface evaluation, or simple user monitoring. 

2. Bodily expressions and their impressions 

The considered scenario for this study is depicted in Fig. 1. First, we have a robot that is 
executing a series of movements. It transmits to the observer a meaningful expression which 
is called bodily expression ©. Second, we have a human observer that perceives the 
expression and interprets it using his/her a priori knowledge ©. Then, the observer gets an 
impression, which means that bodily expression affects him/her to a certain level, 
depending on its strength, his/her awareness or attention and his/her state of mind or 
mentality (D. It is important to emphasize the difference between how the observer 
perceives and interprets a bodily expression, and what impact this expression evokes in the 
observer. It is expected that the two are related, but there is no information about the nature 
of this relation or how it evolves and changes over time. One of the goals of this work is to 
clarify and explain certain aspects of this relation to open the possibility of generating an 
adaptive robot behavior based on this information. 
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Fig. 2. The subset of Shaver's classification of emotions used in the categorization of Bodily 
Expressions. 
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2.1 Classification of bodily expressions 

There is a need to classify bodily expressions generated by a robot in order to investigate 
their effects on the user. For this reason, salient differences among motions should be 
implemented. During an interaction process, humans go through different affective states, 
depending on several conditions such as degree of engagement, degree of awareness, and 
degree of interest among others. It is thus possible to classify every action taking place 
during an interaction process into the emotional effects that it would have on the observer. 
We adopted a simplified version of Whissel's wheel of activation-evaluation space described 
in (Whissel, 1989). We used the fact that we have two primary states for emotions: positive 
and negative ones, also known as pleasant and unpleasant emotions. The considered 
emotions are the following: happiness, surprise, sadness, and anger. In order to categorize 
these emotions we used a subset of Shaver's classification (see Figure 2), where happiness 
and surprise represent pleasant emotions while sadness and anger represent unpleasant 
emotions (Shaver et al., 1987). Bodily expressions were classified using one of the specified 
four emotions as pleasant or unpleasant. 

2.2 Generation of robot bodily expressions 

The humanoid robot ASKA (Ido et al., 2002) used in this study is shown in Figure 3. The 
body has a mobile platform and two arms and is based on the commercial robot TMSUK-4 3 . 
The head is a replica of the Infanoid robot (Kozima, 2002). This humanoid robot with its 
mobile platform has the advantage of being able to generate relatively fast motions 
compared to the currently available biped humanoid robots. 

Since the pioneering work of (Johansson, 1973) on visual perception of biological motion, it 
has been known that humans can perceive a lot of information from body movements 
including the emotional state of the performer (Allison et al., 2000; Pollick et al., 2001). 
Recently, there is a growing interest in mathematically modeling emotion-based motion 
generation for real-world agents such as robots (Lim et al., 2004) and for virtual agents such 
as animated characters (Amaya et al., 1996). To be able to generate bodily expressions that 
reflect the selected emotions we rely on Laban features of movements (Bartenieff & Lewis, 
1980). It has been shown by (Tanaka et al., 2001) that the qualitative Laban features of Effort 
and Shape correlate with the four basic emotions we have selected in section 2.1. 
Based on the mathematical description of Laban features, shown in the Appendix, it is 
relatively easy to classify bodily expressions that reflect a certain emotion. Although there is 
no unique solution to this problem, the goal is to be able to generate a representative bodily 
expression for each one of the selected emotions. 

The generated bodily expressions (BE) which reflect one of the basic emotions of happiness, 
surprise, sadness, anger or none are the following: 

• BE1: The robot raises both arms and turns its body to the left, then to the right, 
twice. The goal is to show an expression of happiness. 

• BE2: The robot raises its right hand and moves it in an arc toward the right side, 
then goes back to its initial position and lowers its right arm, the goal is to show an 
expression of no particular emotion. 

• BE3: The robot raises both arms and its head, then moves backward for some 
distance, the goal is to show an expression of amazement or surprise. 



3 TMSUK-4 is a trademark of tmsuk Co. Ltd, Kitakyushu. 
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BE4: The robot lowers both arms and its head, then moves backward at low speed 

for some distance, the goal is to show an expression of sadness. 

BE5: The robot raises both arms gradually while advancing before stopping, then it 

lowers and raises its arms progressively for two cycles; the goal is to show an 

expression of happiness. 

BE6: The robot advances quickly, then goes back and raises its right arm while 

turning its head a bit to the right. It then lowers its arm and returns its head to the 

original position; the goal is to show an expression of anger. 
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Fig. 3. Overview of the receptionist robot ASKA and its joints-link model. 

The duration of each of these BEs was about 14 [sec]. Their appropriateness and their 
expressiveness was tested experimentally using questionnaires (see section 3.1). 



2.3 Assessment of impression and expressiveness of bodily expressions 

There are mainly two types of methods to assess the effects of a particular action on a 
human. The classic self-reporting approach is widely used, while the assessment from 
measured physiological information is still an open research problem. The first type of 
methods gives subjective evaluation results; whereas the second type of methods is deemed 
to be more objective but suffers from inaccuracies. For our case, in order to assess 
expressiveness we adopted a self-reporting approach and asked the subjects to answer 
questionnaires. However, in order to assess impression the subjects answered 
questionnaires and their brain activity was also recorded. 

Summarizing the subject's answers to questionnaires was used in order to assess 
expressiveness. Every subject had to select from: expression of happiness, expression of 
surprise, expression of sadness, expression of anger, or no meaningful expression. The 
subject also had to specify the degree of the expression in a scale of five: 1 for impertinent, 2 
for slight, 3 for medium, 4 for strong and 5 for very strong. This selection of the degree of 
expression is a redundancy that was meant to confirm the subject's choice and assess the 
degree of confidence in his/her answer. These answers were then categorized into pleasant 
or unpleasant expressions using the subset of Shaver's classification shown in Figure 2. 
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As for impression assessment, spectral analysis method of electroencephalogram (EEG) data 
was used. A short EEG segment can be considered as a stationary process, which can be 
characterized by an autoregressive (AR) model. Let us denote S\Jl) as a sample of EEG 

data of N points. We calculate T f \Yl) and T h (jl) , respectively the forward and backward 
prediction errors, as follows: 



r f (n) = ^a(k)s(n + p - k) 



(1) 



/,=o 



r b {n)= ^a(k)s(n + k) 



(2) 



k=o 



where a{k) is the AR parameters and p is the order of the model. The order p is based 
on the "goodness of fit" criterion. We use the relative error variance (REV) criterion (Schlogl 
et al., 2000), defined as: 

MSE{p)/ 

/MSY 



REV(p) = 



(3) 



MSE(p) is the mean square error or variance of the error process of order p , and 
MSY is the variance of the total power of the signal sample. The optimal p is the one that 
minimizes RE V(p) . In our case we take p = 1 4 . 
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Fig. 4. An example illustrating the calculation of the power of low-alpha band for a 
2[sec] data segment taken from electrode placement F3. The graph to the left shows the 
raw EEG signal for the baseline period and the observation period. The graph to the 
right shows the power spectra of the EEG signals, where low-alpha frequency band is 
highlighted. 

We apply (1) and (2) to calculate an approximate estimation of the power spectrum 
PS\J) of the signal S as follows: 
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VJ ( 4 ) 

PS{f)- 



X + Y^a^Y 11 * 1 
V p = X -Y(VM 2 + [r h {nf\ ( 5 ) 



where y is the averaged sum of the forward and backward prediction error energies and T 

is the sampling period. 

Research in cognitive neuroscience has shown that the power of low-alpha frequency band 
is the most reactive band to social cues such as movements (Allison et al., 2000; Cochin et al., 
1998). We suppose that this frequency band reacts in a similar way to robot bodily 
expressions (Khiat et al., 2006). The next step in assessing the impression is to observe the 
amount of change in the power of low-alpha frequency band compared to the whole 
spectrum. The power L of a band between frequencies a and b is defined by: 



[PS(f)df 



Using (6), we calculate the power of low-alpha band frequency L h for the data taken during 

the baseline period and L m for the data taken during the period of the execution of a bodily 

expression. An example illustrating this calculation is shown in Fig. , where raw 2 seconds 
EEG signals collected during the baseline period and the observation period is shown to the 
left. The power spectrum of these signals is shown to the right, and the low-alpha frequency 

band is highlighted. A comparison between L h and L would indicate the effect of a 

particular bodily expression on the user. This is used as the main evaluation criterion for 
impression. 

3 Experimental study 

3.1 Expressiveness of robot bodily expressions 

The goal of this experiment is to evaluate the expressiveness of every generated robot bodily 
expression. Since this quality is highly subjective, the self-reporting approach is used. 
Subjects. Seventeen (17) participants (two females and fifteen males aged between 20 and 50 
years old) volunteered to take part in this experiment. They were either students or faculty 
members at the Graduate School of Information Science. They were all familiar with robots 
and had previous experiences of dealing with robots similar to the one used in the experiment. 
Procedure. Every subject was shown a total of six bodily expressions, which were described 
in section 2.2. The execution of each of the bodily expressions by the humanoid robot ASKA 
lasted 14 seconds. After observing each robot bodily expression, enough time was given to 
the subject to answer two questions about the expressiveness of that robot bodily expression, 
and one more question about their impression after the observation. These answers were 
then summarized as explained in section 2.3 to assess their expressiveness. 
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BEs 


Pleasant 


Unpleasant 


Neither 


BE1 


100% 


0% 


0% 


BE2 


6% 


35% 


59% 


BE3 


94% 


6% 


0% 


BE4 


0% 


94% 


0% 


BE5 


65% 


12% 


23% 


BE6 


0% 


82% 


18% 



Table 1. Users' evaluations of the expression of each generated robot bodily expression (BE). 

Results. Table 1 shows the results about bodily expressions after categorization into 
pleasant expressions, unpleasant expressions, or neither, clearly indicating the 
expressiveness as evaluated by the observers. The result about impressions is presented in 
Table 2 after categorizing the answers into pleased or unpleased. 

These results demonstrate the existence of a strong correlation between the expressiveness 
of the robot bodily expressions as seen by the subjects and the target expression when these 
bodily expressions were generated (see section 2.2). BE1, which was created to express 
happiness, was classified as having a 100% pleasant expression. BE2, which was created to 
express a neutral emotion, was classified by 59% as neither pleasant nor unpleasant, and by 
35% as unpleasant, suggesting that neutral bodily expressions can have a negative 
connotation. BE3, which was created to express surprise, was classified by 94% as a pleasant 
expression. BE4, which was generated to express sadness, was classified by 94% as being an 
unpleasant expression. Similarly, BE6 which was created to express anger was also classified 
by 82% as an unpleasant expression. The special case of BE5 was classified to a great extent 
as a pleasant expression by up to 65%. However, 23% said it did not express anything in 
particular and 12% claimed it was unpleasant. 



BEs 


Pleased 


Unpleased 


Neither 


BE1 


65% 


35% 


0% 


BE2 


30% 


70% 


0% 


BE3 


68% 


32% 


0% 


BE4 


19% 


81% 


0% 


BE5 


100% 


0% 


0% 


BE6 


47% 


53% 


0% 



Table 2. Users' evaluations of their impressions after observing each robot bodily expression 
(BE). 

The expressiveness of the generated BEs is confirmed to be in accordance with the target 
expressions for which they were created. BEs generated to express happiness and surprise 
expressions were classified as pleasant, and the BEs generated to express sadness and anger 
expressions were classified as unpleasant. Among the generated BEs we could choose one 
that is representative of each category in order to use it in the evaluation of its impressions 
on the observer. 
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3.2 Impressions of robot bodily expressions 

The goal of this experiment is to evaluate the impression on the observer of the generated 
bodily expressions using a hybrid approach that combines the results of self-reporting and 
the analysis of brain activity. 

Subjects. Seven (7) participants (one female and six males, 23-43 years old) volunteered to 
take part in this experiment. They were all students or faculty members at the Graduate 
School of Information Science, and only two of them had the experience of using 
electroencephalography to measure brain activity. Before starting the experiment each 
participant was fitted with electrodes and allowed to spend more than 20 minutes reading 
books of interest to familiarize and condition them to the electrodes' presence. 
Procedure. During each session, 12 EEG channels (using sintered Ag/AgCl electrodes) were 
recorded by the 5200 Series DigitalAmp System 4 . The recording was performed from 10 
placements, namely: Fpl, Fp2, F3, F4, T3, T4, P3, P4, Ol, and 02 according to the international 
10-20 standard (see Fig. 5). The placement Fz was used as the ground, and the signal from the 
left ear placement Al was used as the reference signal. The contact impedance between all 
electrodes and the skull was kept below 5[kQ]. The subjects were shown a total of six motions 
lasting 14 seconds each by the humanoid robot ASKA while their brain activity was recorded 
with 16-bit quantization at a sampling frequency of 200[Hz]. 
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Fig. 5. The experimental setup where brain activity was measured according to the 10-20 
international standard (Jasper, 1958). 

The subjects were asked to relax as much as possible and think of nothing in particular 
when recording the baseline period, which lasted for 14 [sec]. They were also told that they 
would be asked about the robot's movements and that they had to watch carefully when the 
robot was moving. This was important because we needed to make sure that the subjects 
attended to the task. After the observation of each bodily expression, the subjects described 



4 The 5200 Series DigitalAmp System is a trademark of NF Corporation, Yokohama. 
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their impression in their own words. Having no constraints to express themselves, the 
subjects gave more details about their impressions. These answers were used in categorizing 
the impressions into pleased or unpleased based on Shaver's original classification of 
emotions (Shaver et al., 1987). 

Results. Table 2 shows the self-reporting result about the subjects' impressions after 
observing every robot bodily expression. There is a strong correlation between these results 
and the expression results, reported previously in section 3.1, with a coincidence level of 
71%. For example, BE4 impression was considered to be unpleasant by up to 81% and its 
expressiveness was considered unpleasant by 94%. This is also the case for BE1 where its 
impression of being pleasant is 65%, and it expression of being pleasant is 100%. The same 
could be said for BE3, with a pleasant impression of 68% and a pleasant expression of 94%. 
The case of BE6 is different from the previous ones. While its expression was considered 
unpleasant by 82%, its impression shows the small rate of 53% for being unpleasant and 
47% for being pleasant. It is still inclined to the unpleasant side. However, its pleasant effect 
cannot be explained knowing that this bodily expression was created to express anger. The 
last case of BE2 shows a big difference between its 59% neutral expression and its 70% 
unpleasant impression. 



Subject 


Category 


Electrodes 
Fpl Fp2 F3 F4 T3 T4 P3 P4 Ol 02 


1 


Pleasant 
Unpleasant 


___ + ____ + _ 
- + - + + --- + - 


2 


Pleasant 
Unpleasant 


______ + + __ 


3 


Pleasant 
Unpleasant 


____ + + _ + __ 


4 


Pleasant 
Unpleasant 


_____ + + + __ 
______ + + + _ 


5 


Pleasant 
Unpleasant 


---- + + + + -- 


6 


Pleasant 
Unpleasant 


+ + -- + + ---- 
____ + + ____ 


7 


Pleasant 
Unpleasant 


-- + - + + --- + 
__ + ____ + __ 



Table 3. Significant change in low-alpha power according to observed motion categories at 
every electrode and for each subject. (+: significant change p<.05; -: no significant change). 

This suggests that bodily expressions with a neutral expression can be perceived negatively 
and can generate an unpleasant impression. The analysis of EEG data using the method 
described in section 2.3 allowed the calculation of the power £, of low-alpha frequency 

band in each electrode channel and for each bodily expression. It also allowed the 
calculation of the power r of the same frequency band for the baseline period. Comparing 

L and L revealed the effect of observing a bodily expression in the change in the power of 

low-alpha frequency band for each electrode channel. Table 3 summarizes the results of this 
change in power, where only statistically significant change is indicated with the symbol +. 
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It can be seen that significant effect is mostly present at locations T3 and T4, then at P3 and 
P4, and finally at F3 and F4. Knowing that these positions are located above the superior 
temporal sulcus (STS) and above some specific parts of the prefrontal cortex (PFC) confirms 
previous research findings about the activation of STS in the perception of social cues 
(Cochin et al., 1998; Allison et al., 2000), and the activation of the mirror neurons located in 
the PFC during learning and imitation tasks (Rizolatti & Craighero, 2004). Some reaction can 
also be seen at other locations, for instance Ol and Fp2 for subject 1, 02 for subject 7, Fpl 
and Fp2 for subject 6. The reaction at locations Fpl and Fp2 are thought to be the result of 
blinking activity during the recording process, since these electrode positions are the closest 
to the eyes. It is important to assert that no preprocessing was done to avoid data with eye 
blinking artifacts. This approach was adopted because the goal is to conduct this 
investigation in natural conditions, where blinking activity is possible and should be 
considered. The reaction at locations Ol and 02 could be explained by the fact that during 
the vision process the visual cortex gets activated and this activation is usually captured at 
locations Ol and 02. 

Nevertheless, the reactive locations were not always the same among different observers, 
suggesting high individual differences. A generalization cannot be derived at this point 
about the reaction of brain locations according to the category of the bodily expression that 
is being observed. However, the presence of a reaction is confirmed and another approach is 
necessary to achieve a more comprehensive result. On the other hand, there is a need to 
assess the repeatability of similar reactions from the same observer when he/she is shown 
the same bodily expression. 

3.3 Repeatability of reaction in brain activity 

The goal of this experiment is to confirm that the results obtained in the impression 
experiment (see section 3.2) are consistent over time for the same person. In other words, to 
make sure that brain reaction does happen all the time and at the same set of electrodes if a 
subject observes the same bodily expression several times. 

Subject. One (1) student (male, 32 years old) volunteered to take part in this experiment. 
Similar to the previous experiment, the subject was fitted with an electro-cap and was given 
about 30 minutes to familiarize and get used to the presence of the cap. 

Procedure. The subject participated in ten recording sessions. In each session, he was shown 
two bodily expressions, one for each category of bodily expressions, executed by the 
humanoid robot ASKA. Showing only representative bodily expressions is sufficient since 
the goal is to confirm the repeatability of brain reaction. Each bodily expression lasted for 
14[sec], and the baseline period was recorded during the 14[sec] before the execution of 
each bodily expression. BE1 was chosen as a representative of pleasant bodily expressions, 
and BE4 was chosen as a representative of unpleasant bodily expressions. On one hand, 
BE1 was chosen because its expressiveness evaluation as pleasant (100%) was the highest 
among all the bodily expressions. Its impression evaluation (65%) was high enough to 
make sure it will have the desired effect on the observer, even though its impression was 
evaluated as the lowest among all the pleasant bodily expressions. In this case, the 
advantage was given to the expressiveness evaluation over the impression evaluation. On 
the other hand, BE4 was chosen because, similarly to BE1, its expressiveness evaluation as 
unpleasant (94%) was the highest among all the bodily expressions. Its impression 
evaluation (81%) was also the highest among all the bodily expressions, making it the 
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perfect candidate to represent unpleasant bodily expressions. The recording was 
performed from 12 placements of an electro-cap 5 , namely: F3, F4, C3, C4, P3, P4, F7, F8, T3, 
T4, T5, and T6 (see Fig. 5), using the Polymate AP1132 6 EEG recording device. The 
sampling frequency was set to 200[Hz] and the impedance was kept below 5[kQ]. As a 
result of the experiment in section 3.2, the placements Fpl, Fp2, Ol, and 02 were omitted, 
since they were shown to be not of a big importance. On the other hand, new placements 
were introduced, namely: F7, F8, C3, C4, T5, and T6, in order to get a more detailed 
coverage of the prefrontal and temporal cortices. 

During the recording of the baseline period the subject was asked to relax as much as 
possible and to think of nothing in particular. To confirm the attendance to the task, the 
subject was told that he would be asked about the robot's movements and that he had to 
observe carefully. After the observation of each bodily expression, he was asked to 
explain the difference between the recently observed bodily expression and the one just 
before. 



IC3 baseline 
|C3 observation 

]C4 baseline 
JC4 observation 




Bodily Expressions 
Fig.6. Mean alpha power calculated for the ten trials and for electrodes C3 and C4 (*: 
p< .05). 

Results. Analysis of the collected data, using the method described in section 2.3, resulted in 
identifying the electrode channels of placements C3 and C4 as the most reacting for this 
subject. Figure 6 shows the mean power of low-alpha frequency band calculated from the 10 
trials for the electrode placements C3 and C4: where C3 reacted significantly to the pleasant 
bodily expression and C4 reacted significantly to the unpleasant bodily expression. 
Fig. shows the overall result for the two bodily expressions by averaging the power 
change for all the electrodes over the 10 trials. The difference in means is significant 
between BE1 and BE4. Since BE1 is representative of pleasant bodily expressions and 
BE4 is representative of unpleasant ones, this result suggests an overall significant 
decrease in the power of low-alpha frequency band for pleasant motions, and a 
significant increase in power of low-alpha frequency band for unpleasant ones. This 



5 Electro-Cap is a trademark of Electro-Cap International Inc., USA. 

6 Polymate AP1132 was designed by Digitex Lab. Co. Ltd, and is commercialized by TEAC Corporation, Tokyo. 
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confirms that the change in low-alpha power happens every time the observer watches 
a bodily expression, and that this change is inversely proportional to the category of the 
observed bodily expression. 



"-:Tfi 






BE1 BE4 

Bodily Expressions 
Fig. 7. Change in alpha power from the baseline using all electrodes, for each of the 
considered unpleasant and pleasant bodily expressions (* : p < :05). 



3.4 Discussion 

The results presented in Table 1 confirmed the appropriateness of the expressiveness of the 
generated bodily expressions used in the experimental study. They show that the 
unpleasant bodily expressions were classified as unpleasant, and the pleasant bodily 
expressions were also classified as pleasant. During every experiment, the order of which 
the bodily expressions were shown to the observer was random so as not to allow the 
prediction of the nature of the next bodily expression. Although the subjects were not told 
anything about the bodily expressions, their answers agreed with the hypothesis. This 
implies that people tend to see bodily expressions in similar ways, which facilitates their 
interpretations and use in interactions. There exist a shared basic knowledge that allows 
humans the proper interpretation of similar expressions, although this knowledge is highly 
affected by the environment factors of culture and local customs. The bodily expressions 
were treated by the observers as if they were performed by a human even though it was the 
robot ASK A which actually performed them. It would be interesting to compare the 
interpretation of the same bodily expressions executed by both humans and robots to 
evaluate the existence of interpretation differences. 

On the other hand, Table 2 correlates to a high extent with the results of Table 1. Here we 
can infer that observing a pleasant bodily expression will result in a pleasant impression 
on the observer and vice versa. This means that the observer is affected by what he sees 
even though the performer is just a robot. This effect on the observer is shown to be 
present in his/her brain activity with the results of section 3.2. Although, a generalization 
could not be concluded from the obtained results, the presence of a reaction in brain 
activity was proven. 

It is important to acknowledge that the most reactive electrode positions were F3, F4, 
T3, T4, P3 and P4, which are located above the STS and PFC. This supports the claims that 
STS and mirror neurons get activated during the perception of social cues and the 
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observation of movements (Allison et al., 2000; Cochin et al., 1998), and that this can be used 
effectively in the implementation of Brain-Machine Interfaces (Nicolelis, 2001; Wessberg et 
al., 2000). 

Finally, it was necessary to confirm the repeatability or the reproducibility of the same 
reaction in similar conditions. The results showed that the power level of low-alpha 
frequency band over all brain activity was inversely proportional to the category of the 
observed bodily expression. Particularly the most significant reaction was present at 
electrode positions C3 and C4 for the considered subject. These positions are close to the 
premotor and motor cortices. Due to the low spatial resolution of EEG, it is difficult to assert 
precisely which part of the brain is responsible for these reactions. However, current 
research findings confirm that the STS has an important role in the interpretation of social 
cues (Allison et al., 2000), and that mirror neurons are important during learning and 
imitation tasks (Rizolatti & Craighero, 2004). 

4. Recognition of the impressions using Self-Organizing Maps 

Self-Organizing Maps or Kohonen Networks are well suited to represent and generalize 
input data with an underlying structure that is not easily grasped (Kohonen, 1982). The 
unsupervised learning process tries to give a representation to the high-dimensional data 
rather than only a classification. Both the metric relation and the probability density of the 
data is approximated during this process, allowing the classification of newly measured 
data. Once the learning process has terminated, a labeling of the learned map can be done 
by using a relatively small set of labeled data items. The resulting map could be used to 
monitor the topographic patterns related to specific events as in (Joutsiniemi et al., 1995), or 
it could be used in the recognition of newly observed data. The similarity between a d- 

dimensional feature vector x = (x, xA ar, d a prototype vector M = (m,,...,m . ) m the 

learned map is calculated using the weighted Euclidean distance defined as: 

D 2 {X,M)=f,w J (x J -m J f (7) 

7=1 

where W ■ is the weighting factor which can be used to give preference to certain features 

over others. This proved helpful and important in the semi-assisted learning of the data 
structure that was necessary for our EEG data. The usual approach in using SOM starts by 
preprocessing the selected data. Then, a feature extraction method is specified and used. 
After that, the map is calculated using competitive learning. Finally, the resulting map is 
labeled and used for recognition. It is important to note that in practical applications the 
selection and preprocessing of data is of extreme importance, because unsupervised 
methods only illustrate the structure in the data set, and the structure is highly influenced 
by the features chosen to represent data items (Kaski, 1997). In the following we will show 
how we used SOM in the recognition of bodily expressions executed by a humanoid robot 
and of similar bodily expressions executed by a human. 
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4.1 Recognition of the impressions of robot bodily expressions 

4.1.1 Data acquisition 

In the recognition of the impressions of robot bodily expressions we use the same data as in 
section 3.2. The data consists of all the signals collected at a sampling rate of 200[Hz] from 
the 10 EEG placements for twice 14 seconds and for each one of the seven subjects. 

1 raw EEG 

I Time 
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Fig. 8. Preprocessing EEG data for features extractions is done by calculating a moving 
average of overlapping windows of predefined length (3 [sec]). 

4.1.2 Data preprocessing 

To prepare the collected data for the training task, we calculate the moving averaged power 
spectra of 10x2x7=140 signal sources. As shown in Fig. , we apply a 3[sec] (600-point) 
Hanning window on the signal with l[sec] (200-point) overlap. The windowed 3[sec] epochs 
are further subdivided into several l[sec] (200-point) sub-windows using the Hanning 
window again with l/2[sec] (100-point) overlap, each extended to 256 points by zero 
padding for a 256-point fast Fourier transform (FFT). A moving median filter is then applied 
to average and minimize the presence of artifacts in all the sub-windows. The resulting 
moving averaged power spectrum is then reduced to six (6) features by integrating the 
spectral values weighted by six raised-sine shaped windows with an area normalized to 
unity. Consequently, the feature components do not need to be normalized. In a similarly 
way to (Toutsiniemi et al., 1995), the weighting windows are overlapping to ensure a smooth 
change of the features in accordance with the change in power spectrum. These windows 
cover the following frequency bands: 
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♦ Delta : 00-04 [Hz] 

♦ Theta : 04-08 [Hz] 

♦ Low-Alpha: 08-11 [Hz] 

♦ High-Alpha: ll~13[Hz] 

♦ Beta : 13-30 [Hz] 

♦ Gamma : 30-50 [Hz] 

Notice that the alpha frequency band is divided into low and high. This will allow us to give 
different importance coefficients to each frequency band, according to each one's contribution in 
recognizing the effect of bodily expressions. The resulting time series of EEG power spectrum 
features consists of a vector of 10x6=60 features every 2[sec] (400-point) time intervals. 

4.1.3 Map training and recognition results 

The 2D map to learn using the collected features is arranged as a 2D lattice, with each 
location containing a 60-dimensional prototype vector. During the learning process or the 
self-organization, the importance coefficients w , used in the similarity metric (7), were 

taken as 0.5, 0.5, 1.0, 0.9, 0.5, and 0.3 for the features delta, theta, low-alpha, high-alpha, beta, 
and gamma, respectively. Higher importance was given to the alpha bands with an 
emphasis on the low-alpha band, as a result of its sensitivity to the category of bodily 
expression being observed by a subject (see section 3.2). On the other hand, the higher 
frequency gamma band was given the lowest importance coefficient, since it was not proven 
to react significantly to social cues (Cochin et al., 1998). 

After the training of the map, an approximation of the probability density of the input data 
is reached generating clusters which can be identified as associated to one of the following 
experimental conditions: observing unpleasant bodily expression, observing pleasant bodily 
expression, or baseline condition. 80% of the data was used for the training and the 
remaining 20% was used for the evaluation. The resulting recognition rate was of 65.2%, 
divided as 62.8% for data associated to unpleasant bodily expressions, 59.3% for data 
associated with pleasant bodily expressions, and 73.5% for data associated to baseline. 
Clearly, the rate of 65.2% is not satisfactory since this is a low rate to rely on when making a 
decision. However, when individual data was used separately in the learning process the 
recognition rate jumped high to the 79.5%. The previously realized low rate is explained by 
the existence of conflicting data items collected from different subjects. In order to 
understand the effect on the recognition rate of using data from different sources (subjects), 
all possible combinations of data sources were used to learn several maps and the 
recognition rates calculated. The change in recognition rates is shown in Fig. . It can be 
noticed that the addition of a new source of data decreased the recognition rate by about 5% 
for the first two additions. However, there is certain stability in the rate when the number of 
data sources was 3, 4 and 5. But again the rate decreased by a lower factor when adding 
more sources. This decline is explained by the existence of individual differences in the 
reaction to bodily expressions and probably even the interpretation. This outcome was also 
observed in the previous experiment about impressions (see section 3.2). To cope with this 
problem it is recommended to train several SOMs with a small number of data sources and 
use the totality in the recognition process. 
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4.2 Recognition of the impressions of human bodily expressions 

There is little knowledge of the difference in the effect left on a person when observing 
humans and when observing robots. Most of the literature reports either cases (Ito & Tani, 
2004; Allison et al., 2000; Schaal, 1999) and there is, to our knowledge, no previous work that 
tried to draw on the parallels between the two cases. Therefore, it is worthwhile to 
investigate similarities and differences in the recognition of the category of the impressions 
left on the observer for both cases. 

In order to collect brain activity data when subjects are observing human bodily expressions, 
we conducted an experiment similar the one for the expressiveness of robot bodily 
expressions (see section 3.2). The goal is to evaluate the impression on the observer of the 
bodily expressions described in section 2.2. 
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Fig. 9. Recognition rates when using data from different sources (subjects), when observing 
robot's bodily expressions. 



4.2.1 Data acquisition 

Subjects. Three (3) participants (males aged between 21 and 23 years old) volunteered to 
take part in this experiment. They were students at the graduate school of information 
science. They were all familiar with the experiment since they did participate in the previous 
one about the impressions of robot bodily expressions. Their brain activity was collected 
with an EEG measurement device and they were familiarized with the presence of the 
electrodes by letting them spend about 20 minutes reading books or surfing the Internet. 
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Fig. 10. Recognition rates when using data from different sources (subjects), when observing 
a human performer. 
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Procedure. The human bodily expressions to be shown to the subjects were prepared 
beforehand. A volunteer in a black tight suit performed these bodily expressions. He had a 
black cover on his head to make sure that no facial expressions get to the observer. These 
bodily expressions were a reproduction of the bodily expressions described in section 2.2 and 
were recorded on HDTV 7 video tapes. During the experiment, the bodily expressions were 
projected on a display big enough to ensure that the projected images of human performer is 
as close as possible to real size. Similar to the experiment in section 3.2, the recording was 
obtained from 12 electrode locations, namely: F3, F4, F7, F8, C3, C4, P3, P4, T3, T4, T5, T6, 
using the Polymate AP1132 EEG recording device. The sampling frequency was 200[Hz] and 
the impedance was kept below 5[k£2], The subjects were shown a total of six bodily 
expressions lasting 10[sec] each. During the recording of the baseline period, the subjects were 
asked to relax as much as possible and to think of nothing in particular. To help them achieve 
this state of mind, they were shown for 10[sec] the empty space of the room where the 
recording of the bodily expressions was performed. In addition, to confirm the subjects 
attended to the task they were told that they would be asked about the bodily expressions and 
that they had to observe carefully. After each observation, they were asked to explain the 
difference between the bodily expressions they just observed and the previous ones. 



Performer of BEs 


Human 


Robot 


Data sources 


Individual 


Combined (3) 


Individual 


Combined (7) 


Recognition rate (%) 


79.0 


70.3 


79.5 


65.2 



Table 4. Comparison of the recognition rates for the cases of human bodily expressions and 
robot bodily expressions. 

4.2.2 Recognition results 

The preprocessing of the data was the same as in the previous case (section 4.1.2). Moreover, 
80% of the data was used for training and the remaining 20% was used for evaluation. The 
resulting recognition rate was 70.3%, divided into 68.0% for data associated to the 
observation of unpleasant bodily expressions, 67.5% for data associated to the observation of 
pleasant bodily expressions, and 80.4% for data associated to the baseline. This rate is much 
better than the 65.2% rate achieved with the data of robot bodily expressions. However, the 
number of sources in this case is only 3, while it was 7 for the robot case. Thus, it is more 
accurate to compare this result to the 70.8% rate achieved when using only 3 data sources 
for the robot case, as shown in Fig. . 

Using data from different sources showed degradation of the recognition rate, similar to the 
result of the robot, see Fig.. The addition of one source resulted in a decrease of 5.1%, then a 
decrease of 3.6% after adding a third source. This result supports the fact that individual 
differences remain present even in the case of observing bodily expressions performed by a 
human. 

4.3 Discussion 

Regardless of the performer, whether a robot or a human, the recognition rates of the 
category of the observed bodily expression was about 80% when using subject's data 



7 HDTV stands for High Definition television a.k.a High- Vision which allows the recording of a high resolution video 
stream. 
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individually. However, this rate decreased significantly when additional data from other 
subjects was used in the training process. To cope with this problem it would be interesting 
to train one SOM for each data source, and then combine the resulting SOMs into a bigger 
structure for the recognition task. Adopting this approach could result in keeping a high 
recognition rate while taking into consideration all the data that was collected so as not to 
loose the generality of the solution. 

It is interesting to note that the average difference between the recognition rates for robot 
and human cases is relatively small as summarized in Table 4. This proves that SOMs are 
suitable to the generalization of the effect in the input EEG signals regardless of whether this 
effect is generated by a human or a robot. Even if differences appear clearly when analyzing 
raw EEG data, the SOMs succeed in eliminating these differences and keep only the 
important information. SOMs also succeed in separating the noise and artifacts from signals 
reflecting brain activity. This is another powerful characteristic that helps in the online 
processing of EEG signals for applications related to Brain-Machine Interfaces (BMI). 

5 Conclusions and future work 

In this paper, we investigated the relation between bodily expressions and their 
impressions on the observer. We started by generating six bodily expressions, then we 
classified them to belong to two categories according to their expressiveness, namely: 
pleasant and unpleasant. Their expressiveness was confirmed statistically by a self- 
reporting experiment where a number of volunteers answered questionnaires about the 
bodily expressions. Afterwards, we conducted an experiment to assess the impressions on 
the observers while watching the considered bodily expressions by collecting the 
observer's brain activity using electroencephalogram (EEG). The method adopted for 
spectral analysis revealed a correlation between the power level of low-alpha frequency 
band (8~ll[Hz]) and the category of the observer bodily expression. The reproducibility 
or repeatability of this band's reaction was confirmed with a third experiment where a 
subject observed repeatedly candidate bodily expressions for each category. These results 
have opened the opportunity to utilize the change in the power level of low-alpha 
frequency band to examine the capacity of a humanoid robot in activating the social 
perception system in a human observer. A challenging problem that rises from this result 
is about the degree to which such reaction appears when observing robots with different 
human-like physical and behavioral characteristics. The understanding of which robot 
properties are necessary or sufficient to activate the social perception system in an 
observer is of particular interest. 

Another important direction was to define a method which can assess and recognize the 
impression category from the observer's brain activity. For this reason, we presented a 
computational method to use for the recognition of the impression of bodily expressions. 
The self-organizing maps (SOM), which allows at the simultaneous reduction of the amount 
of data and its projection to a lower dimensional space, was used for the recognition. It was 
shown that SOMs achieve relatively high recognition rates considering that the data we 
used is not filtered for noise elimination. The existence of high individual differences in the 
considered data was handled successfully with the SOM, because of its ability to separate 
signals resulting from different processes. 

Future research directions should focus on improving the recognition rate to over 90% and 
to try to recognize a refined classification of bodily expressions. A link with human motion 
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styles (Hsu et al., 2005) would be interesting to provide more details about the bodily 
expressions. There is an extensive work on generating emotional motions (Amaya et al. 
1996; Lim et al. 2004) that could be incorporated in this research. Reaching this goal would 
enable us to build an adaptive interface that makes the robot judge the effect of its own 
gestures on the users and allows it to change its behavior accordingly. 
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Appendix: Mathematical expression of Laban features of the robot ASKA 



Right 
point 




Fig. 11. Diagram of table plane superposed on a top view of the robot ASKA. 

The mathematical definition of Laban features (Shape and Effort) using the robot's 
kinematic and dynamic information is given such that larger values describe fighting 
movement forms and smaller values describe indulging ones (Tanaka et al., 2001). Bartenieff 
and Lewis stated in (Bartenieff & Lewis, 1980) that the Shape feature describes the 
geometrical aspect of the movement using three parameters: table plane, door plane, and 
wheel plane. They also reported that the Effort feature describes the dynamic aspect of the 
movement using three parameters: weight, space, and time. The robot's link information 
which will be used in the features definitions is given in Fig. . In order to simplify the 
mathematical description, a limited number of joint parameters were considered in this 

definition, namely: the left arm u n , the right arm u rl , the neck O^ , the face 2 , the left 

wheel G), , and the right wheel C0 r . The remaining parameters were fixed to default values 

during movement execution. 

Using the diagram shown in Fig. , the table parameter of feature Shape represents the 
spread of silhouette as seen from above. It is defined as the scaled reciprocal of the 
summation of mutual distances between the tips of the left and the right hands along with a 
focus point, as shown in (8). 



Shape 



table 



+ T +T 

L LF ~ * RF ~ A LR 



) 



(8) 



where, 



T LF = J(L F cos Jj cosS 2 -L a cos# n ) 2 



+ 



1/7 ~ ^ F C0S ^ S ^ n ^ 2 ) ' 



T RF = Jj {L F cos <5"i cosS 2 - L a cos 6 rl f + \~yC - L F cos S x sin S 2 j , 
T LR =^jsh 2 + {L A cos^ n -L a cos0 rl f , 
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The point of focus is set at the fixed distance r = 44 [cm] in the gaze line of the robot's head. 
Sh = 33 [cm] is the distance between the shoulders; L = 44 [cm] is the arm's length during 
movement execution and S is a scaling factor. The door parameter of feature Shape 
represents the spread of the silhouette as seen from the font. It is defined as the weighted 
sum of the elevation angles of both arms and the head as shown in (9). The sine is used to 
reflect how upward or downward is each joint angle. The weights d ,d ,d were fixed 

empirically to 1. 

Shape door = df sin n + d r sin rl + d n sin 8 n \ (9) 

The wheel parameter of feature Shape represents the lengthwise shift of the silhouette in the 
sagittal plane. It is defined as the weighted sum of the velocities of the robot and the 

velocities of the arm extremities as shown in (10). Weights were set empirically to -8 for W t , 
to -1 for Wj and W r . 

Shape „ heel = w,v lr + w,L A —cos n + w r L A —cos rl (10) 

dt dt 

The weight parameter of feature Effort represents the strength of the movement. It is 
defined in (11) as the weighed sum of the energies exhibited during movement per unit time 
at each part of the body. Weights were adjusted with respect of to the saliency of body parts. 
Relatively large weights e — e = 5 were given to the movement of the trunk and smaller 

weights were given to the movements of the arms e ; = e r = 2 and the neck e„ = 1 . 

Effort weight = e,0fl + e r e\ + ejl x + e tr v 2 tr + e rt v 2 , (11) 

where v tr = d>i + co r is the translation velocity and v rt = d)j - co r is the rotation velocity. 
The space parameter of feature Effort represents the degree of conformity in the movement. 
It is defined in (12) as the weighed sum of the directional differences between elevation 
angles of the arms and the neck as well as the body orientation. Weights are also defined 
empirically by giving advantage to the arms' bilateral symmetry s lr = -5 and body 

orientation S rt = — 5 over the other combinations S ln = S m = — 1 . 

E ff°rt sp ace = s rl \®rt \ + s Ir |#/1 " #rl | + s l„ |#/1 " <?«1 1 + s m |#rl " ^nl | (I 2 ) 

The time parameter of feature Effort represents the briskness in the movement execution 
and covers the entire span from sudden to sustained movements. It is defined in (13) as the 
ratio indicating the number of generated commands per time unit. 

„„ number of generated commands /ho\ 

Effort time = (1 3) 

time span 
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1. Introduction 

The bipedal structure is one of the most versatile ones for the employment of walking robots. 
The biped robot has almost the same mechanisms as a human and is suitable for moving in an 
environment which contains stairs, obstacle etc, where a human lives. However, the dynamics 
involved are highly nonlinear, complex and unstable. So it is difficult to generate human-like 
walking motion. To realize human-shaped and human-like walking robots, we call this as 
humanoid robot, many researches on the biped walking robots have been reported [1-4]. In 
contrast to industrial robot manipulators, the interaction between the walking robots and the 
ground is complex. The concept of the zero moment point (ZMP) [2] is known to give good 
results in order to control this interaction. As an important criterion for the stability of the 
walk, the trajectory of the ZMP beneath the robot foot during the walk is investigated [1-7]. 
Through the ZMP, we can synthesize the walking patterns of humanoid robot and 
demonstrate walking motion with real robots. Thus ZMP is indispensable to ensure dynamic 
stability for a biped robot. The ZMP represents the point at which the ground reaction force is 
applied. The location of the ZMP can be obtained computationally using a model of the robot. 
But it is possible that there is a large error between the actual ZMP and the calculated one, due 
to the deviations of the physical parameters between the mathematical model and the real 
machine. Thus, actual ZMP should be measured to realize stable walking with a control 
method that makes use of it. 

In this chapter, actual ZMP data throughout the whole walking phase are obtained from the 
practical humanoid robot. And evolutionary inductive self-organizing network [8-9] is 
applied. So we obtained natural walking motions on the flat floor, some slopes, and uneven 
floor. 

2. Evolutionary Inductive Self-organizing Network 

In this Section we will depict the evolutionary inductive self-organizing network (EISON) to 
be applied to the practical humanoid robot. Firstly the algorithm and its structure are shown 
and evaluation to show the usefulness of the method will be followed. 
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2.1 Algorithm and structure 

The EISON has an architecture similar to feed-forward neural networks whose neurons are 
replaced by polynomial nodes. The output of the each node in EISON structure is obtained 
using several types of high-order polynomial such as linear, quadratic, and modified 
quadratic of input variables. These polynomials are called as partial descriptions (PDs). The 
PDs in each layer can be designed by evolutionary algorithm. The framework of the design 
procedure of the EISON [8-9] comes as a sequence of the following steps. 

[Step 1] Determine input candidates of a system to be targeted. 

[Step 2] Form training and testing data. 

[Step 3] Design partial descriptions and structure evolutionally. 

[Step 4] Check the stopping criterion. 

[Step 5] Determine new input variables for the next layer. 

In the following, a more in-depth discussion on the design procedures, step l~step 5, is provided. 

Step 1: Determine input candidates of a system to be targeted 

We define the input variables such as x u ,x 2i ,--- x m related to output variables y t , where N and 

i are the number of entire input variables and input-output data set, respectively. 

Step 2: Form training and testing data. 

The input - output data set is separated into training ( n lr ) data set and testing ( n u , ) data set. 

Obviously we have n = n u + n u . The training data set is used to construct a EISON model. 
And the testing data set is used to evaluate the constructed EISON model. 

Step 3: Design partial descriptions(PD) and structure evolutionally. 

When we design the EISON, the most important consideration is the representation strategy, 
that is, how to encode the key factors of the PDs, order of the polynomial, the number of 
input variables, and the optimum input variables, into the chromosome. We employ a 
binary coding for the available design specifications. We code the order and the inputs of 
each node in the EISON as a finite-length string. Our chromosomes are made of three sub- 
chromosomes. The first one is consisted of 2 bits for the order of polynomial (PD), the 
second one is consisted of 3 bits for the number of inputs of PD, and the last one is consisted 
of N bits which are equal to the number of entire input candidates in the current layer. 
These input candidates are the node outputs of the previous layer. The representation of 
binary chromosomes is illustrated in Fig. 1. 



Bits in the I s ' 
sub-chromosome 


Order of polynomial(PD) 


00 


Type 1 - Linear 


01 


Type 2 - Quadratic 


10 


11 


Type 3 - Modified quadratic 



Table 1. Relationship between bits in the 1st sub-chromosome and order of PD. 

The 1st sub-chromosome is made of 2 bits. It represents several types of order of PD. The 
relationship between bits in the 1st sub-chromosome and the order of PD is shown in Table 
1. Thus, each node can exploit a different order of the polynomial. 
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1st sub-chromosome 


2nd sub-chromosome 


3rd sub-chromosome 








aa 


a^a 


aaaa- 


1 






Type of 
a partial description 


Number of inputs to 
a partial description 


Input candidates 




Range: 1~3 


Range: 1~5 


1 \ Selected input variable 

\ > among the input candidates 

\ Non-selected input variable 

\ sj among the input candidates 







Fig. 1. Structure of binary chromosome for a PD 

The 3rd sub-chromosome has N bits, which are concatenated a bit of O's and Ts coding. The 
input candidate is represented by a 1 bit if it is chosen as input variable to the PD and by a 
bit it is not chosen. This way solves the problem of which input variables to be chosen. 
If many input candidates are chosen for model design, the modeling is computationally 
complex, and normally requires a lot of time to achieve good results. In addition, it causes 
improper results and poor generalization ability. Good approximation performance does 
not necessarily guarantee good generalization capability. To overcome this drawback, we 
introduce the 2nd sub-chromosome into the chromosome. The 2nd sub-chromosome is 
consisted of 3 bits and represents the number of input variables to be selected. The number 
based on the 2nd sub-chromosome is shown in the Table 2. 



Bits in the 
chromosome 


2nd 


sub- 


Number of inputs to PD 


000 


1 


001 


2 


6io 


2 


011 


3 


100 


3 


101 


4 


110 


4 


111 


5 



Table 2. Relationship between bits in the 2nd sub-chromosome and number of inputs to PD. 

Input variables for each node are selected among entire input candidates as many as the 
number represented in the 2nd sub-chromosome. Designer must determine the maximum 
number in consideration of the characteristic of system, design specification, and some 
prior knowledge of model. With this method we can solve the problems such as the 
conflict between overfitting and generalization and the requirement of a lot of computing 
time. 
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The relationship between chromosome and information on PD is shown in Fig. 2. The PD 
corresponding to the chromosome in Fig. 2 is described briefly as Fig. 3. 

Fig. 2 shows an example of PD. The various pieces of required information are obtained 
its chromosome. The 1st sub-chromosome shows that the polynomial order is Type 2 
(quadratic form). The 2nd sub-chromosome shows two input variables to this node. The 
3rd sub-chromosome tells that X\ and X(, are selected as input variables. The node with PD 
corresponding to Fig. 2 is shown in Fig. 3. Thus, the output of this PD, y can be expressed 
as(l). 

y — j (x t ,x 6 ) — c + CjX ] +c 2 x ft + c 3 Xj +c 4 x 6 + c 5 x l x 6 ,-, 

where coefficients Co, Ci, ..., C5 are evaluated using the training data set by means of the 
standard least square estimation (LSE). Therefore, the polynomial function of PD is formed 
automatically according to the information of sub-chromosomes. 



Input cadidates 



» 



Chromosome 



1st sub- 
--> chromosome 



2nd sub- 
chromosome 



Information on PD 



Forming a PD 



Order of 
polynomial 



— > No. of inputs 



3rd sub- 
chromosome 




Fig. 2. Example of PD whose various pieces of required information are obtained from its 
chromosome. 



'. quadratic 
(Type 2) 




> : 2 inputs 

Fig. 3. Node with PD corresponding to chromosome in Fig. 2. 

Step 4: Check the stopping criterion. 

The EISON algorithm terminates when the 3rd layer is reached. 

Step 5: Determine new input variables for the next layer. 

If the stopping criterion is not satisfied, the next layer is constructed by repeating step 3 

through step 4. 



Advanced Humanoid Robot Based on the Evolutionary Inductive Self-organizing Network 



453 



c 



Start 



J 



Generation of initial population : 

the parameters are encoded into a 
chromosome 



Evaluation : each chromosome is 
evaluated and has its fitness value 




Results : chromosomes which have 
good fitness value are selected for the 
new input variables of the next layer 



The fitness values of the new chromosomes 

are improved trough generations with 

genetic operators 

Reproduction: roulette wheel 



A: 00000001111 



B: 11000110011 



A: 00000001111 
B: 11000110011 



One-point crossover 

:> 



: crossover site 



A': 0000000: 00 1 1 
B : 1 1 00 1 ill 1 1 1 



before crossover 



after crossover 



Invert mutation 



— : mutation site 



A: 00000000011 



A': 000 



10000011 



before mutation 



after mutation 



End : one chromosome (PD) 

characterized by the best 

^performance is selected as the outputy 

when the 3rd layer is reached 



Fig. 4. Block diagram of the design procedure of EISON. 

The overall design procedure of EISON is shown in Fig. 4. At the beginning of the process, 
the initial populations comprise a set of chromosomes that are scattered all over the search 
space. The populations are all randomly initialized. Thus, the use of heuristic knowledge is 
minimized. The assignment of the fitness in evolutionary algorithm serves as guidance to 
lead the search toward the optimal solution. Fitness function with several specific cases for 
modeling will be explained later. After each of the chromosomes is evaluated and associated 
with a fitness, the current population undergoes the reproduction process to create the next 
generation of population. The roulette-wheel selection scheme is used to determine the 
members of the new generation of population. After the new group of population is built, 
the mating pool is formed and the crossover is carried out. The crossover proceeds in three 
steps. First, two newly reproduced strings are selected from the mating pool produced by 
reproduction. Second, a position (one point) along the two strings is selected uniformly at 
random. The third step is to exchange all characters following the crossing site. We use one- 
point crossover operator with a crossover probability of P c (0.85). This is then followed by 
the mutation operation. The mutation is the occasional alteration of a value at a particular 
bit position (we flip the states of a bit from to 1 or vice versa). The mutation serves as an 
insurance policy which would recover the loss of a particular piece of information (any 
simple bit). The mutation rate used is fixed at 0.05 (P m ). Generally, after these three 
operations, the overall fitness of the population improves. Each of the population generated 
then goes through a series of evaluation, reproduction, crossover, and mutation, and the 
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procedure is repeated until a termination condition is reached. After the evolution process, 
the final generation of population consists of highly fit bits that provide optimal solutions. 
After the termination condition is satisfied, one chromosome (PD) with the best 
performance in the final generation of population is selected as the output PD. All 
remaining other chromosomes are discarded and all the nodes that do not have influence on 
this output PD in the previous layers are also removed. By doing this, the EISON model is 
obtained. 

2.2 Fitness function for EISON 

The important thing to be considered for the evolutionary algorithm is the determination of 
the fitness function. The genotype representation encodes the problem into a string while 
the fitness function measures the performance of the model. It is quite important for 
evolving systems to find a good fitness measurement. To construct models with significant 
approximation and generalization ability, we introduce the error function such as 

E = 0xPI + (l-0)xEPI (2) 

where e [0, 1] is a weighting factor for PI and EPI, which denote the values of the 
performance index for the training data and testing data, respectively, as expressed in (5). 
Then the fitness value is determined as follows: 

F=— (3) 

\ + E 

Maximizing F is identical to minimizing E. The choice of establishes a certain tradeoff 

between the approximation and generalization ability of the EISON. 

2.3 Evaluation of the EISON 

We show the performance of our EISON for well known nonlinear system to see the 
applicability. In addition, we demonstrate how the proposed EISON model can be 
employed to identify the highly nonlinear function. The performance of this model will be 
compared with that of earlier works. The function to be identified is a three-input nonlinear 
function given by (4) 

^(l + x^+x^'+x; 1 - 5 ) 2 (4) 

which is widely used by Takagi and Hayashi [10], Sugeno and Kang[ll], and Kondo[12] to 
test their modeling approaches. 

40 pairs of the input-output data sets are obtained from (4) [14]. The data is divided into 
training data set (Nos. 1-20) and testing data set (Nos. 21-40). To compare the 
performance, the same performance index, average percentage error (APE) adopted in 
[10-14] is used. 

APE = —Y ' y '~ y <' xl00 (%) (5) 

where m is the number of data pairs and j^and y t are the i-th actual output and model 

output, respectively. 

The design parameters of EISON in each layer are shown in Table 3. The simulation results 

of the EISON are summarized in Table 4. The overall lowest values of the performance 

index, PI=0.188 EPI=1.087, are obtained at the third layer when the weighting factor (8) is 

0.25. 
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Parameters 


1st layer 


2nd layer 


3rd layer 


Maximum generations 


40 


60 


80 


Population size:( w) 


20: (15) 


60:(50) 


80 


String length 


8 


20 


55 


Crossover rate (P c ) 


0.85 


Mutation rate (P m ) 


0.05 


Weighting factor: 8 


0.1-0.9 


Type (order) 


1-3 



Table 3. Design parameters of EISON for modeling. 

w. the number of chosen nodes whose outputs are used as inputs to the next layer 



Weighting factor 


1st layer 


2nd layer 


3rd layer 


PI 


EPI 


PI 


EPI 


PI 


EPI 


0.1 


5.7845 


6.8199 


2.3895 


3.3400 


2.2837 


3.1418 


0.25 


5.7845 


6.8199 


0.8535 


3.1356 


0.1881 


1.0879 


0.5 


5.7845 


6.8199 


1.6324 


5.5291 


1.2268 


3.5526 


0.75 


5.7845 


6.8199 


1.9092 


4.0896 


0.5634 


2.2097 


0.9 


5.7845 


6.8199 


2.5083 


5.1444 


0.0002 


4.8804 



Table 4. Values of performance index of the proposed EISON model. 
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Generations 



20 40 60 80 100 120 140 161 

Generations 



(a) training result 



(b) testing result 



Fig. 5. Trend of performance index values with respect to generations through layers (8 
=0.25). 

Fig. 5 illustrates the trend of the performance index values produced in successive 
generations of the evolutionary algorithm when the weighting factor 6 is 0.25. Fig. 6 
shows the values of error function and fitness function in successive evolutionary 
algorithm generations when the 6 is 0.25. Fig. 7 depicts the proposed EISON model with 
3 layers when the 8 is 0.25. The structure of EISON is very simple and has a good 
performance. 
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20 40 



80 100 120 140 

Generations 



20 40 



80 100 120 140 160 181 

Generations 



(a) error function (E) (b) fitness function (F) 

Fig. 6. Values of the error function and fitness function with respect to the successive 
generations (8 =0.25). 




Fig. 7. Structure of the EISON model with 3 layers (6 =0.25). 

Fig. 8 shows the identification performance of the proposed EISON and its errors when the 
8 is 0.25. The output of the EISON follows the actual output very well. 

Table 5 shows the performance of the proposed EISON model and other models studied in 
the literature. The experimental results clearly show that the proposed model outperforms 
the existing models both in terms of better approximation capabilities (PI) as well as superb 
generalization abilities (EPI). 







APE 




PI(%) 


EPI (%) 




GMDH model[12] 


4.7 


5.7 


Fuzzy model 
[11] 


Model 1 


1.5 


2.1 


Model 2 


0.59 


3.4 


FNN [14] 


Type 1 


0.84 


1.22 


Type 2 


0.73 


1.28 


Type 3 


0.63 


1.25 


GD-FNN [13] 


2.11 


1.54 




0.188 




EISON 


1.087 



Table 5. Performance comparison of various identification models. 
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(a) actual versus model output of training data set (b) errors of (a) 



— Actual output 

- Model output 
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(c) actual versus model output of testing data set (d) errors of (c) 

Fig. 8. Identification performance of EISON model with 3 layers and its errors 



3. Practical Biped Humanoid Robot 

3.1 Design 

Biped humanoid robot designed and implemented is shown in Fig. 9. The specification of 
our biped humanoid robot is depicted in Table 6. The robot has 19 joints and the height and 
the weight are about 445mm and 3000g including vision camera. For the reduction of the 
weight, the body is made of aluminum materials. Each joint is driven by the RC servomotor 
that consists of a DC motor, gear, and simple controller. Each of the RC servomotors is 
mounted in the link structure. This structure is strong against falling down of the robot and 
it looks smart and more similar to a human. 



Size 


Height : 445mm 


Weight 


3kg 


CPU 


TMS320LF2407 DSP 


Actuator 

(RC Servo motors) 


HSR-5995TG (Torque : 30kg cm at 7.4 V) 


Degree of freedom 


19 DOF (Leg+Arm+ Waist) = 2*6 + 3*2+1) 


Power source 
Actuator 


Battery 


: AA Size Ni-poly (7.4V, 1700mAh ) 


Control board 


: AAA size Ni- poly (7.4V, 700mAh) 



Table 6. Specification of our humanoid robot 
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Fig. 9. Designed and implemented humanoid robot. 
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Fig. 10. 3D humanoid robot design and its practical figures 

Front and side view of 3D robot and its practical figures are shown in Fig. 10. Block diagram 
of the biped walking robot and its electric diagram of control board and actuators are also 
shown in Figs. 11 and 12, respectively. Implemented control board and its electric wiring 
diagram schematic is presented in Fig. 13. 
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CONTROL BOARD j 1 ACTUATORS j 
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PERSONAL COMPUTER 












Command : 











Fig. 11. Block diagram of the humanoid robot 
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Fig. 12. Electric diagram of control board and actuators 
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Fig. 13. Implemented control board and its electric wiring diagram schematic 



3.2 Zero moment point measurement system 

As an important criterion for the stability of the walk, the trajectory of the zero moment 
point (ZMP) beneath the robot foot during the walk is investigated. Through the ZMP, we 
can synthesize the walking patterns of biped walking robot and demonstrate walking 
motion with real robots. Thus ZMP is indispensable to ensure dynamic stability for a biped 
robot. 
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Fig. 14. Representation of joint angle of the 10 degree of freedoms. 

The places of joints in walking are shown in Fig. 14. The measured ZMP trajectory data to be 
considered here are obtained from 10 degree of freedoms (DOFs) as shown in Fig. 14. Two 
DOFs are assigned to hips and ankles and one DOF to the knee on both sides. From these 
joint angles, cyclic walking pattern has been realized. This biped walking robot can walk 
continuously without falling down. The zero moment point (ZMP) trajectory in the robot 
foot support area is a significant criterion for the stability of the walk. In many studies, ZMP 
coordinates are computed using a model of the robot and information from the joint 
encoders. But we employed more direct approach which is to use measurement data from 
sensors mounted at the robot feet. 

The type of force sensor used in our experiments is FlexiForce sensor A201 which is shown 
in Fig. 15. They are attached to the four corners of the sole plate. Sensor signals are digitized 
by an ADC board, with a sampling time of 10 ms. Measurements are carried out in real time. 




Fig. 15. Employed force sensors under the robot feet. 
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The foot pressure is obtained by summing the force signals. By using the force sensor data, it 
is easy to calculate the actual ZMP data. Feet support phase ZMPs in the local foot 
coordinate frame are computed by equation 6 



■ ' = 1 
8 

1/ 



(6) 



where /i is each force applied to the right and left foot sensors and n is sensor position which 
is vector. 



4. Walking Pattern Analysis of the Humanoid Robot 

The walking motions of the biped humanoid robot are shown in Figs. 16-18. These figures 
show series of snapshots in the front views of the biped humanoid robot walking on a flat 
floor, some slopes, and uneven floor, respectively. Fig. 16 gives the series of front views of 
this humanoid robot walking on a flat floor. In Fig. 17 depict the series of front views of this 
humanoid robot going up on an ascent. Fig. 18 shows another type of walking of biped 
humanoid robot, which is walking motion on an uneven floor. 




Fig. 16. Side view of the biped humanoid robot on a flat floor 




Fig. 17. Side view of the biped humanoid robot on an ascent. 
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Fig. 18. Side view of the biped humanoid robot on an uneven floor. 

Experiments using EISON was conducted and the results are summarized in tables and 
figures below. The design parameters of evolutionary inductive self-organizing network in 
each layer are shown in Table 7. The results of the EISON for the walking humanoid robot 
on the flat floor are summarized in Table 8. The overall lowest values of the performance 
indicies, 6.865 and 10.377, are obtained at the third layer when the weighting factor (8) is 1. 
In addition, the generated ZMP positions and corresponding ZMP trajectory are shown in 
Fig. 19. Table 9 depicts the condition and results for the actual ZMP positions of our 
humanoid walking robot on an ascent floor. We can also see the corresponding ZMP 
trajectories in Fig. 20, respectively. 



Parameters 


1st layer 


2nd layer 


3rd layer 


Maximum generations 


40 


60 


80 


Population size:( w) 


40:(30) 


100:(80) 


160 


String length 


13 


35 


85 


Crossover rate (P c ) 


0.85 


Mutation rate (P m ) 


0.05 


Weighting factor: 


1 


Type (order) 


1-3 



Table 7. Design parameters of evolutionary inductive self-organizing network. 
w: the number of chosen nodes whose outputs are used as inputs to the next layer 



Walking condition 
slope (deg.) 



Layer 



MSE (mm) 



x-coordinate y-coordinate 



9.676 
7.641 
6.865 



18.566 
13.397 
10.377 



Table 8. Condition and the corresponding MSE are included for actual ZMP position in four 
step motion of our humanoid robot. 
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Fig. 19. Generated ZMP positions and corresponding ZMP trajectories (0 ). 



5. Concluding remarks 

This chapter deals with advanced humanoid robot based on the evolutionary inductive 
self-organizing network. Humanoid robot is the most versatile ones and has almost the 
same mechanisms as a human and is suitable for moving in an human's environment. 
But the dynamics involved are highly nonlinear, complex and unstable. So it is difficult 
to generate human-like walking motion. In this chapter, we have employed zero 
moment point as an important criterion for the balance of the walking robots. In 
addition, evolutionary inductive self-organizing network is also utilized to establish 
empirical relationships between the humanoid walking robots and the ground and to 
explain empirical laws by incorporating them into the humanoid robot. From obtained 
natural walking motions of the humanoid robot, EISON can be effectively used to the 
walking robot and we can see the synergy effect humanoid robot and evolutionary 
inductive self-organizing network. 
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Fig. 20. Generated ZMP positions and corresponding ZMP trajectories (10 ). 
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1. Abstract 

One of the most important tasks in biped robot is the balance-keeping control. A question 
arisen as how do our human beings make the balance-keeping possible in upright standing 
as human beings are the only biped-walking primates, which takes several million years of 
evolution period to achieve this ability. Studies on humans' balance-keeping mechanism are 
not only the work of physiologists but also a task of robot engineers since bio-mimetic 
approach is a shortcut for developing humanoid robot. This chapter will introduce some 
research progresses on balance-keeping control in upright standing. We will introduce the 
physical characteristics of human body at first, modeling the physical system of body, 
establishing a balance-keeping control model, and at last applying the balance-keeping 
ability assessment for falls risk prediction. We wish those studies make contributions to 
robotics. 

2. Introduction 

Scientist's interest and fascination in balance control and movement has a long history. 
Leonardo da Vinci emphasized the importance of mechanics in the understanding of 
movement and balance, wrote that "Mechanical science is the noblest and above all the most 
useful, seeing that by means of it all animated bodies which have movement perform all 
their actions! 1 ]". In 1685, by using a balance board, Italian mathematician Johannes 
Alphonsus Borelli, in his "De motu animalium", determined the position of the center of 
gravity as being 57% of the height of the body taken from the feet in the erect position. 
Those early studies on human body mechanics determining the inertial properties of 
different body segments serve an important and necessary role in modern biomechanics. 
Not like static upright stance in biped robot, upright standing in human is a high-skill task 
needed a precise control. In 1862 Vierordtand later Mosso (1884) demonstrated that normal 
standing was not a static posture but rather a continuous movement. Kelso and Hellebrandt 
in 1937 introduced a balance platform to obtain graphic recording of the involuntary 
postural sway of man in the upright stance and to locate the centre of weight with respect to 
the feet as a function of time. Using a force analysis platform and an accelerometer, 
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Whitney! 2 ! (1958) concluded that the movement of the center of foot pressure must 
exaggerate the accompanying movement of the center of gravity of the body mass. Based on 
those studies, postural sway is regarded as presenting at the hip level suggests that the 
trunk rotates relative to the limbs during standing. 

In other hand, clinical significance of postural sway was first observed by Babinski in 
1899. He noted that a patient with the disorder termed "asynerhie cerebelleuse" could not 
extent his trunk from a standing position without falling backwards. He concluded that 
the ankle must perform a compensatory movement to prevent the subject from falling 
backwards. Thus, Babinski was one of the first to recognize the presence of an active 
postural control of muscular tone and its importance in balance control and in voluntary 
movements. 

Modern theory on postural control was established on the work of Magnus! 3 ! and De 
Kleijnl 4 ]; they proposed that each animal species have a reference posture or stance, which is 
genetically determined. According to this view, postural control and its adaptation to the 
environment is based on the background postural tone and on the postural reflexes or 
reactions, which are considered to originate from inputs from the visual! 5 I, vestibular! 6 ! and 
proprioceptive! 7 ! system. The gravity vector is considered to serve as a reference frame. The 
center nervous system needs to accomplish two tasks related to the force of gravity. First, it 
must activate extensor muscles to act against the gravity vector, creating postural tone. 
Second, it must stabilize the body's centre of gravity with respect to the ground. 
Many studies! 8-10 ! on balance-keeping control have been published in recent two decades. A 
lot of theoretical control models have been proposed for elucidating the body sway 
phenomenon during upright standing. Among them, Inverted pendulum is the most 
popular model for body sway analysis. However, those studies are seldom considered 
individual's physical conditions, and one-link inverted pendulum is dominated. However, a 
practicable control model for humans' balance-keeping ability assessment is still 
unavailable. For this reason, we proposed a PID model of upright standing, and a reality- 
like body sway was simulated. This chapter presents some recent research results in our 
laboratory in balance-keeping control and emphasizes its application for fall risk prediction 
in fall-prone subjects. 

3. Upright Standing and Body Sway 

Force-plate is a popular device for postural sway measurement! 11 !. A force-plate usually 
installs three or four force sensors for ground reaction force measurement. The center of 
ground reaction pressure (COP) can be calculated automatically by a computer program. 
However, COP is not equal to body sway because body's inertia is an important factor 
which influences the deviation of COP! 12 I. We developed an optical measuring system, 
which can directly record the trunk sway. 

The body sway-measuring system was designed to record multi-channel video signal 
simultaneously (Fig.l). This device included three high-resolution CCD video cameras and 
one personal computer that were used for video signal recording and image processing. 
Three markers (white ball, 3.0cm in diameter) were used for imaging recognizing with one 
being put on subjects' back and two being put on legs 10cm above about the knee joint, and 
on the floor, a force-plate was installed for COP deviation assessment. During the 
measuring, subjects were told to stand on the force-plate and glance at a marker put on the 
front wall in the same level of their eyes. 
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■COP & EMG Recording PC 
^ Postural Sway Analyzing PC 

Fig.l. The scheme shows of the body sway-measuring device. The device consists of three 
high-resolution CCD cameras (Caml, Cam2 and Cam3) and a whiteboard. A white ball 
with diameter of 1.0cm was put on the whiteboard in subject's eye level. During the 
measurement, subjects were asked to fix their eyes on the ball. Postural sway analyzing was 
executed on one desktop PC, and an EMG recording PC was also installed too. 

In this study, body sway in lateral direction was recorded and the body was regarded as 
two-link inversed pendulum. The motion of the first link was on ankle and the second link 
was on lumbosacral. The angular sway of the first link was measured as the averaged value 
of the two legs and the angular sway of the second link was calculated indirectly as shown 
in Fig.2, where point O is the ankle joint and point A is the lumbosacral joint. PI represents 
the marker of legs and P2 represents the marker of subject's back. There have 



Xj = /] sin 6 X 



x 2 = L x sin X + L sin(#, + 6 2 ). 



(1) 
(2) 



Because the angular deviation scope is relatively small (usually < 2.0 degree), then we set 
sin 9 X = 6 X , sin(#j + 2 ) = 6 X + 6 2 approximately, and 2 can be calculated as equation (3), 



e 2 = ( 



j— tt ~\ if i-i 



-)(i+f). 



(3) 



Here, we defined the height of lumbosacral as L, which is the distance from floor to 5 th 
lumbar spine. The entire calculations can carried out online. Subjects kept a static upright 
stance on the force-plate with eyes open or eyes closed. For minimizing psychological 
influence, subjects were asked to numerate on mind while doing standing task. 
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Fig.2. Relation of the sway angles during static upright stance. Points PI and P2 are the 
positions of the markers, their coordinates can be calculated by the CCD video camera 
measuring system. Points O and A represent are the ankle joint and lumbosacral joint 
respectively. Q is the angular sway of the ankle joint, and ff is the angular sway of the 

lumbosacral joint. The postural sway is considered only in coronal direction. 

A study from eight healthy young subjects! 13 ! shown that the body sways scope of ankle and 
lumbosacral were 0.94±0.36degree (eye-open), 1.35±0.52degree (eye-closed) and 
0.99±0. 41 degree (eye-open), 1.27±0.72degree (eye-closed), respectively. No significant 
difference existed between the ankle and lumbosacral. The ankle and lumbosacral sway 
approximately in the same degree during the static upright stance. Further more, Fourier 
transform of the sway time series showed that the phase differences between ankle and 
lumbosacral were approximately equal to 7t , i.e. i,e, ankle sways in opposite direction to 
the lumbosacral. The results help for establishing a structural inverted pendulum model. 
We also analyzed the relationship between trunk sway and deviation of COPI 12 !. By setting 

y{l) as the deviation of COP, U\t) as the trunk sway, an approximate equation can be 

acquired, yif) ~ ku(t) + hu{f) , where, k,h are constants. The result proven that COP 
deviation is different with trunk sway, and body's inertia effect does added on the effect of COP 
deviation as Whitney pointed out! 5 ! 
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4. Musculoskeletal System of Human Body 

From the viewpoint of structural specificity of the human body, pelvis and ankle are two 
major structures that play a pivotal role in balance control, the pelvic strategy and ankle 
strategy as hereby defined. 

Human pelvis is composed of four irregular bones: two hip bones laterally and in front the 
sacrum and coccyx behind. Sacrum articulated with vertebral column formed lumbosacral 
joint, and also make joint with two femurs formed hip joints (Fig.3a). Muscles associated 
with lumbosacral are mainly ascribed to two pairs: posas major (PM) and glutaeus medius 
(GM), and in addition, also the erectors. In fact, many muscles surround hip joint are 
involved in the joint movement, and because the synergic effect! 14 - 15 I of muscles' activity, it 
is difficult to deal each individual muscle separately. For model's simplicity, two pairs of 
muscles are selected for representing the total muscles' effects in the lateral direction 
movement (Fig.3a), i.e. PM and GM. It is regarded that the movement of lumbosacral is 
controlled by PMs and GM. This assumption can well explain the relationship between GM 
contraction and COP deviation in coronal direction. 

The characteristic structures of femur are its head, greater trochanter and lesser trochanter. 
The greater trochanter is a large, irregular, quadrilateral eminence, situated at the junction 
of the neck with the upper part of the body, serves as the insertion of the tendon of the GM. 
The Lesser Trochanter is a conical eminence, projects from the lower and back part of the 
base of the neck, gives insertion to the tendon of the PM. The shape of femur looks like a 
letter of "Y" (Fig. 3b). Based on the structural characteristics of pelvis, lumbosacral, hip and 
femurs an upright body model was constructed (Fig.3c). In this model, pelvis is expressed as 
a triangle connecting vertebral column and femurs with lumbosacral joint and hip joint 
respectively and driven by two symmetrical pairs of actuators, the PM and GM, form a 
closed multi-link system. In order to make the dynamics analysis concisely, the distance 
between two feet was supposed equal to the distance between two hip joints. Thus, the aim 
of central nerve system is to keep the angles Q and to be zero, i.e. to keep the body 

upright (Fig.3c). 

From the structural model of body the position of vertical projection of center-of-mass 

(COM) defined as y which can be calculated as equation (4). 

r com V 1 ^ ' 

When \y I > j , upright standing is impossible and falls may happen. In other words, 
upright stance is possible only when 

\Vj\*d. (5) 

Because the angular sway scope in ankle is approximately equal to the sway scope in 
lumbosacral, and their phase difference is n , the relationship of sway angles between ankle 
and lumbosacral can be considered as (Fig. 3c) 

e,=-e 2 . (6) 

Equation (6) also indicates that the trunk of body is always keeping perpendicular to the 
horizon. Based on this fact the structure model of body can be simplified as be shown in 

Fig.4. 
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In this model, PM and GM are the actuators keeping the inverted pendulum to be balanced. 
By intuition, right GM and left PM should activate simultaneously, which is an agonist 
against left GM and right PM. Its dynamics can be deduced by Lagrangian equation as 
equation (7). 

T = [2mJ 2 + (m 2 +m 3 )l l 2 ]8 — [2mJ + (m 2 + m 3 )l{\gSind (7) 

( T : The torque generated from activities of GM and PM) 




Fig.3. The pelvic anatomical structure and its responding structure model during static 
upright stance. The pelvic is fused by four bones, the sacrum, ilium, pubis and ischium, 
which linked with 5 th lumbar spine and femur through lumbosacral and hip joint (a). The 
femur includes a head that is articulated with ilium and a greater trochanter and lesser 
trochanter where the GM and PM terminated (b). The structural model of pelvic is schemed 

(c). f,,f 2 are the forces produced by PM, and f. f are produced by GM. m,,m 2 and fYl-^ 

represented the mass of lower leg, pelvic and upper body respectively. Q is the sway angle 

of leg and Q is the sway angle of upper trunk. 
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Fig.4. A simplified pelvic structural model during static upright stance. A, B are the masses 
of leg, C is the mass of pelvic and D is the mass of upper trunk. Because the lumbosacral 
always sways in inverse direction of the ankle joint with the same value of 9 , the upper 
trunk is kept perpendicular to the horizon. 



5. The Roles of Foot in Balance Control 

Feet are special structure that evolved fit for standing and locomotion. The anatomy 
structure of human feet is quite complex which make up of a doze of muscles and near 
thirty bones. When subject stand on one foot, brisk EMG activities can be recorded on the 
surface of muscles around the ankle joint! 16 ], by contrast, nearly no EMG activities can be 
recorded while standing on two feet. Using a force sheet to measure COP deviation, we 
recorded the surface EMG activities of m. tibialis anterior (TA), m. peroneus longus (PL) 
and Caput mediale m. gastrocnemii (MG) when asked subjects standing on one foot. The 
relationship between COP deviation and muscular activities was identified! 13 ! (Fig. 5). 
Based on the experimental data we established a model of muscular control of the COP 
deviation (Fig. 6). Briefly, the ground reaction under the foot can be concentrated to three 
points; one is situated on the heel the other two are situated on the root of thumb and 
little figure respectively, which making up a triangle. The ground reaction forces acting 
on the vertex of the triangle are controlled by TA, PL and MG respectively as shown in 
the figure 6, which keeping the COP located in the scope of the triangle to make the 
inverted body balanced. 
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Fig.5. Showing COP deviation and surface EMG recordings from TA, PN and TS when 
standing on one foot. The COPx (coronal section) is closely related to TA and PN 
contraction. 
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Fig.6. The model of COP control when standing on one foot. The scheme shows how the force 
under the three points is controlled by muscular contraction. Supposing the forces act on the 
three points as f f and f , then the muscles bear the responsibility should be PN, TA, and 

TC respectively. The three muscles are innerved by neurons located at the L4-S2 level. 



Balance-Keeping Control of Upright Standing in Biped Human Beings and 
its Application for Stability Assessment 



475 



Nervous commands come from cerebral cortex reached to alpha motor neurons of the spinal 
anterior column to control the muscles around ankle joint to keep the body balanced. This 
kind of balance-keeping control is ankle strategy of balance control. In contract, nervous 
commands come from cerebrum to control the muscles around pelvis joints to keep the 
body balanced, hereby defined as pelvis strategy. It seems that the balance-keeping strategy 
is transformed from ankle to pelvis when the standing posture changes from one foot to two 
feet of upright standing. 

Generally, the contact areas of the feet and ground play a key role in the static upright 
standing, with the more contact areas the feet have, the more stable of the postureI 17 I Will be. 
The contact surface is a polygon. Their vertexes are the contact points receive ground 
reaction force, and the area of the polygon was defined as effective contact area (Fig. 7). 
What important is the physical stability of the inverted body, which determined by the 
height of the COM and the effective contact area of polygon. By defining the effective 
contact areas as S , the body mass as ttl and the height of COM as h, then C = mgh I \[s is an 
constrained value of the upright standing ability. In human beings, when standing on one 
foot, S value is approximately equal to 3000 cm 2 , thus the C value reaches up to 10,000 
Newton, which is far larger than the biped robot in the world in current time. 




B« 



; 



IFD 



/ 



Fig. 7. Scheme shows the effective contact area of support when standing in a static upright 
stance. The area is changed with different inter-foot-distance (IFD). 



6. PID Model of Balance Control 

As upright standing often modeled as an inverted pendulum, a feedback balance control is 
speculated to stabilize the pendulum system! 18 - 19 1. Similar to control systems of any artifacts, 
balance-keeping control of human body is composed of sensors, actuators and a controller. 
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Physiologically, at least three sensory organs are contributed to balance-keeping, i.e. the 
vestibular organ! 5 !, eyes! 6 ! and proprioceptors PI of muscles. Those sensors detect the gravity 
vector or body kinetic state and send information to the central nervous system (CNS). 
Actuators are muscles which affiliated to various body parts to produce torques. As 
mentioned ahead two pairs of muscle in pelvis, GM and PM are especially important in 
balance-keeping control (Figure 3a). The CNS works as a controller which controls muscles 
of whole body to produce torque needed for balance-keeping. Up now, the mechanism of 
body balance control is not clear clarified. 

The body sway can be viewed as an output of body control process of humans' brain, which 
provides an interesting clue for investigating inner balance-keeping control mechanism. 
Using a classical control theoretical approach, a multi-link model of human body in coronal 
section as shown in Figure 3 has been constructed. The PID controller is modeled for the 
body's balance-keeping controller in upright standing. The dynamics of upright standing 
body is expressed as equation (7), the controller is 

e(t) = r(t)-0(t) (r(0 = 0) (8) 

de{t-t d ) 



T{t) = K p e{t-t d ) + K D - 



dt 



-K,[e(t-t d )dt, (9) 



where t d is the time lag, K ,K ar >d K, are unknown gains of the PID control. By 

combining the Eq. (7) and the Eq. (9), and set Sin 6(f) = 6{t) yields 
[2mJ g +(m 2 +m 3 )l l ]8(t) — [2mJ g +(m 2 -\-m^)l x ]g&(t) 

The overall system is depicted in Figure 8. 



dt 



-K,[e{t-t d )dt. (10) 
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Fig.8. A block diagram of humans' static upright stance control. Because the aim of standing 
is to keep the body upright, the reference value of postural sway angle is set to zero. The 
central nervous system detects the error signal and sends output signal to muscles to keep 
the body upright, a state of equilibrium. 
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One task is to estimate its parameters of the PID model according to the measurement of 
body sway. This is a typical closed-loop identification problem to estimate controller 
parameters instead of plant parameters. Here, we take a modified lest square method for 
PID parameters estimation since a time lag parameter t, is included in Eq. (10). The 

parameters of K p , K n ,K, and t can be identified based on the observation of 0. We use a 

linear regression scheme by rewriting the Eq. (10) as 

y(t) = K p u 1 (t-t d ) + K D u 2 (t-t d ) + K,u i (t-t d ), (11) 

(y = [2mJ 2 g + (m 2 + m 3 )lf]0(t)-[2m,l s + {m 2 +m 1 )l l ]g0(t), 

u l (t-t d ) = e(t-t d ), 

de(t-t d ) 



u 2 (t-t d )-- 
u^t-tj)-- 



dt 
e(t-t d )dt) . 

Here, the y(t) ,U x {t — t d ) ,U 2 {t — t d ) and U 3 (t — t d ) are measurable. Rewriting Eq. (11) 



y = Dk + e, (12) 

here, y,e are mxl vectors, k = [K p K D Kj]' and Q. is a mx3 matrix (m>3), then the 
parameters can be calculated as: 



K p 



= A~ 



(y(t),u 2 (t-t d )) 
(y(t),u,(t-t d )) 



where, A = D'D . (,) is inner product. Lets S - 
satisfy 

= 0. 



dS_ 
dt r , 



■■HO* 



(14) 



(13) 



t , can be estimated because it 



While the parameters are identified, selecting an initial function such as 
0{t) = 0.01xsin(?) , {-t d < t < )/ the body sway can be simulated. Figure 9a shows one of the 

simulated body sway with parameters estimate from the experimental data of a 37 years old 
healthy male. When a white noise is added to the input in this simulation, the time series of 
the simulated body sway are similar with the actual recorded in its amplitude and 
frequency (Figure 9b). This proved that the plant/ controller model captures an essential 
mechanism of body sway and balance-keeping control. 

As considering the sensor noise is input as shown in Figure 8 of sin(ft#) , the total 
transfer function from sensor noise to output (Fig. 10) becomes 



T(S): 



c,M 



Gfi 2 e-"' +1 
K D s 2 +K p s + K 
s 
1 



G 2 (s)-- 

2 Is 2 + G 

(I = 2m,/ 2 + (m 2 + m, )l 2 ;G = 



(15) 



(16) 



(17) 



-[2mj/ g + (m 2 + m 3 )/, }g;Sin0 = 0). 
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Its frequency response can be calculated, the gain |r(_/fi;)| becomes a function of CO and t d 

as show in Figure 11. The peaks of gain appeared in both eyes open and eyes closed are 
consistent with the experimental recorded. The results suggest that the PID model is 
reasonable and useful for balance-keeping control analysis. 
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Fig. 9. Simulated and the Experimental recorded body sway in eyes open, time record (a) 
and frequency response (b). The subject is a 37-year old male. The body height is 1.65m, and 
body weight is 65Kg. K i K i K an d t are estimated from the experimental data as 

519.0Nm/rad., 72.3Nms/rad., 3.0 Nm/rad.s and 0.11s respectively. The frequency 
expression of simulated is similar to the experimental recorded. 
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Fig. 10. While considering the sensory noise of u = smax as the input, the PID model can 
be simplified as three processes, i.e. a time-lag process, controller of Gl and dynamics of 
G2. 
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Fig. 11. The frequency response of the balance keeping control system simulated from the 
same subject. The gain is a function of CO and t ■ The peaks of gain in eyes open (0.06Hz) 

and eyes closed (0.16Hz) are different, which is roughly consistent with the experimental 
data as a peak of 0.02Hz in eyes open and a 0.12Hz in eyes closed. 



7. Balance-Keeping Ability and fall prediction 

Human beings' balance-keeping ability is developed in different age! 20 !. It is known that in 
age of 20 to 21 years, individual's balance-keeping ability get to his/her peak, and then 
decreased with aging! 21 ! One of main causing factors leads to falls happening in old subjects 
is the deterioration of the balance-keeping ability! 22 ]. Because of this fact balance-keeping 
ability assessment becomes the first step for falls-happen prediction. Up to now, several 
versions of individual's balance-keeping procedures have been published! 23 !, however 
satisfied and practicable method which can be used for falls-happen prediction is still 
unavailable. One reason of these failures attributed to the influence of individual's physical 
status on the balance-keeping ability assessment. 

Previous works have shown that mean velocity of COP ( V COP) is modified by a lot of 
factors including sex! 24 !, aging, body weight! 25 !, height! 26 !, and also to subjects' grasping 
power! 27 !. These factors make the interpretation of the results uneasy when inter-individual 
comparisons are considered. An effort to adjust the results of COP examination by different 
age groups has been conducted and its results has been using for ordinary clinical 
practice! 21 !. However, subjects' physical characteristics, an important factor which affect the 
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result of COP examination, are still not under specific consideration, even though a lot of 
models have been proposed! 28 - 29 1. We introduced an index of averaged angular velocity 
(TSS) for stability estimation! 30 ] and it was proven that TSS is a useful index. Based on a 
study on 68 subjects, we found that increased in body weight and height tended to decrease 
the body sway but aging increased body sway. The reciprocal of TSS, defined as trunk sway 

time (TST), kept a close correspondence with the V COPI 30 !. To reduce the influence of 
physical characteristics, especially the height on the measuring results, an empirical 
mathematic model had been introduced as 

sjmxl" 



TST=kx- 



6 + B 



.(18) 



where m is body weight, / is height and k,6 are constants, B defined as Itt -1 ,) , y is age 

(year). This model fit the recorded data quite well. 

Another interesting result acquired from a study on aging effect on the PID controller. 

Parameters identification showed that K D is deceased with aging! 31 ! (Fig.12). The result 

suggests a promising method for individual's balance-keeping ability assessment in the 
future. 
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Fig.12. x values take from 102 healthy elderly subjects (averaged 64.1±16.3 years old). The 
X is normalized as KdI=K I mgh ( m '- body weight, g: gravity acceleration, h: body 
height). The result shows the K D decreased with aging just like the equilibrium ability. 

A consequences of falls may be serious at any age, but to the elderly they have significant 
beyond that in younger people. These consequences can be direct physical effects such as 
injury or death, or less direct effects of increased dependency and impaired self-care, 
leading to demands on carers, or profound psychological effects. It is known that postural 
stability is related to the incidence of falls in elderly. Individuals with poor postural stability 
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show high fall incidence. When defining the measured trunk-sway-time as TST. , the value 
of r . =TST I TST represents individual's postural stability. From a study of 51 older health 
subjects! 32 !, the relationship between individual's R . value and the incidence of falls was 
investigated. Regressive analyses showed that the value of R and the incidence of falls 
have a reverse relationship, when r <1.12 the falls incidence was significantly higher than 
r >1.12. The results suggest that falls are predictable byff assessment, and this 
procedure appeared a useful means for fall-prevention especially in elder people. 

8. Vision influences on Balance Control 

Keeping upright stance is a basic task for other complex motions such as locomotion and 
running. The mechanisms of visual information benefit for motor control are still unknown. 
We investigated the visual influence on balance-keeping control in upright standing! 33 !. Ten 
healthy subjects (male 4, female 6, aged 37 .7 ±7 .21 years) take part in the study. Four kinds of 
visual stimulation patterns are designed (3 for central visual field stimulation, one is eyes 
closed, Fig.13). 




Fig.13. A shows a simplified body biomechanics structural model during static upright 
stance. ®, ® are the masses of leg, © is the mass of pelvic and © is the mass of upper 
trunk. Because the lumbosacral always sways in inverse direction of the ankle joint with the 
same value of , the upper trunk is kept perpendicular to the horizon. B is the block 
diagram of humans' static upright stance control. Because the aim of static upright standing 
is to keep the body upright, the reference value of postural sway angle is set to zero. The 
central nervous system detects the error signal and sends output signal to muscles to keep 
the body upright, a state of equilibrium. C shows a simplified block diagram of upright 
stance control. Here, the sense noise (Sin(&#) ) is looked as an input, and the output is y , 
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the deviation of the body's center-of-mass. Gl is the transfer function of PID controller, and 
G2 is the transfer function of inverted pendulum model. 

The results showed that the gain of the PID parameter K D is decreased significantly in eyes 
closed (131.5±37.6Nms/rad in eyes open and 90.4±26.0Nms/rad in eyes closed , p<0.001, 
Fig.14), however, K p and Kj are unchanged. Simulation results also proved that when 

decreasing the gain of K D locus of simulated is more like the measured spontaneous body 

sway in eyes closed. The results suggested environmental visual cue is important for 
balance-keeping control, and this effect is pattern-dependent. Of cause, angular velocity is 
also increased when eyes are closed (Fig. 15). 
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Fig.14. Averaged K D values of the 10-subject decreased from A to D. No significant 

differences was found between A and B, however, others showed significant difference. **: 
p < 0.01. 
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There are two hypothesizes have been proposed about the mechanism of the visual effect on 
balance-keeping. One regards that information coming from proprioception of extra-ocular 
muscles is important for balance-keeping control! 34 ]. The other theory is "retinal slip" 
insisted that images slip on the retinal is used as a cue for balance-keeping controU 35 I. Our 
present studies agreed with the retinal slip hypothesis. 
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Fig.15. Averaged values of the TSS in different visual stimulations are increased from A to 
D. However, no significant difference between A and B. ***: p < 0.001. 



9. Summary 

Upright standing is a simple and basic posture of human beings. However, the relatively 
large mass of the upper body and its elevated position in relation to the area of support 
during standing accentuate the importance of an accurate control of trunk movements for 
the maintenance of equilibrium. The kinematics and control strategy of the central nervous 
system have been studied in recent decade, which brought a PID control algorithm to model 
the balance-keeping in upright standing and had successfully interpreted the phenomenon 
of spontaneously body sway. Modeling the human body as two-link inverted pendulum 
system, we successfully identified parameters of individual's PID parameters and make this 
model analyzable and practicable. The simulation results, both of the body sway and the 
spectral response, are quite consistent with experimental data. This proved that the PID 
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model is a reasonable and a useful method as well as by measuring the averaged angular 
velocity. Both of the two methods help for falls prediction, and become a promising method 
for falls prevention. 

Many authors have argued that complex architectures including feedforward/feedback is 
necessary for the maintenance of upright stance, however, our studies together with some 
other recent studies have shown that a model based primarily on a simple feedback 
mechanism with 120-ms to 150-ms time delay can account for postural control during a 
broad variety of disturbance! 36 ]. Also, one interesting result is that k is a key parameter 

related to individual balance keeping ability. Since k is not just influenced by visual cue 

but also sensitive to aging. It seems that human balance keeping ability is mainly 
determined by the gains regulation of fc , and still there have much works to be done in the 

future. 
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1. Introduction 



Recently, the psychological point of view that grants the body a more significant role in 
cognition has also gained attention in artificial intelligence. Proponents of this approach 
would claim that instead of a 'mind that works on abstract problems' we have to deal with 
and understand 'a body that needs a mind to make it function' (Wilson, 2002). These ideas 
differ quite radically from the traditional approach that describes a cognitive process as an 
abstract information processing task where the real physical connections to the outside 
world are of only sub-critical importance, sometimes discarded as mere 'informational 
encapsulated plug-ins' (Fodor, 1983). Thus most theories in cognitive psychology have tried 
to describe the process of human thinking in terms of propositional knowledge. At the same 
time, artificial intelligence research has been dominated by methods of abstract symbolic 
processing, even if researchers often used robotic systems to implement them (Nilsson, 
1984). 

Ignoring sensor-motor influences on cognitive ability is in sharp contrast to research by 
William James (James, 1890) and others (see (Prinz, 1987) for a review) that describe theories 
of cognition based on motor acts, or a theory of cognitive function emerging from seminal 
research on sensor-motor abilities by Jean Piaget (Wilson, 2002) and the theory of 
affordances by (Gibson, 1977). In the 1980s the linguist Lakoff and the philosopher Johnson 
(Lakoff & Johnson, 1980) put forward the idea of abstract concepts based on metaphors for 
bodily, physical concepts; around the same time, Brooks (Brooks, 1986) made a major 
impact on artificial intelligence research by his concepts of behavior based robotics and 
interaction with the environment without internal representation instead of the sense- 
reason-act cycle. This approach has gained wide attention ever since and there appears to be 
a growing sense of commitment to the idea that cognitive ability in a system (natural or 
artificial) has to be studied in the context of its relation to a 'kinematically competent' 
physical body. 

Among the most competent (in a multi functional sense) physical bodies around are 
certainly humans, so the study of humanoid robots appears to be a promising field for 
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understanding the mechanisms and processes involved in generating intelligence in 

technical systems. 

In the following we will give an overview of the field of humanoid robot research. 

2. State of the Art Humanoids 

A review on humanoid robot systems, cannot be made without bearing in mind that many 
of the current developments concentrate on one or the other feature of human performance. 
Some of them are good at manipulating objects with anthropomorphic arms but move over 
a wheeled platform. Some others walk on two legs but lack of a torso and arms. Some 
combine those two features but lack a human appearance or communication abilities. Some 
other developments concentrate on human-like communication skills, like speech 
recognition, gestures and the generation of facial expressions that denote sadness, 
happiness, fear or any other state that a human is able to recognise. All these aspects are 
crucial for the final goal of attaining a robot that is perceived as humanoid. A robot that 
transmits its feelings, ideas or thoughts, that behaves like a human when performing a task 
or a movement and with which a human feels safe and confident to collaborate with are key 
points for the social acceptance of a humanoid robots. 

A description of the state of the art might start chronologically since the development of the 
first complete humanoid, the Wabot-1 from the Waseda University in 1970 but we choose to 
list the developments on the humanoid field beginning with the systems that incorporate 
the more human-like features and are considered the most advanced systems to continue 
describing systems that work on single or a combination of several human-like aspects, all 
of them of major interest and importance. 

Before describing the most important developments on the field, it is worth mentioning a 
few aspects about observable facts depending on the origin of the robot: namely, Asia, 
Europe or USA. There are differences on the complexity of the systems but also on the 
different approaches that are followed or the motivations that lead the development of the 
robot. Japan (and Korea in a minor extent) are seen as world leaders in the humanoid robot 
research. They have the most complex robots with the most similar human resemblance. 
They believe in a complete immersion of the robots in a future society, where robots do not 
differentiate easily from humans. The more remarkable points of their developments are the 
hardware (the mechanics), the physical appearance of the robot and the fact that the 
industry is leading the research on these robots, expecting a huge market in a near future. 
USA entered the humanoid era because of the needs posed by the claims of the 'modern' 
Artificial Intelligence: the need for a human-like body as a prerequisite for a robot to achieve 
human-like intelligence. It is the interaction with the environment and the gathered 
experience what is thought to be the basis for the appearance of intelligent behaviors. 
Europe, on the other side, is basically concerned with giving a real application to the 
development of humanoid robots and that is on the service robotics area, for rehabilitation 
and/ or personal care of the elderly. The most advanced systems are heading towards that 
goal. At the same time, inspiration from biological systems is a very common term to 
describe the approaches used in those robots. Several European projects work on 
sensorimotor coordination, cognitive architectures and learning approaches that have their 
roots in the cooperation with scientists in the neuroscience, biology and psychology areas. 
The ASIMO robot from Honda (Hirai, 1998) (cf. figure 1) is without a doubt the most 
advanced humanoid robot nowadays. Honda employed vast human and economical 
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resources to achieve a complete human-like looking robot, pushing forward the research in 
many areas. The current research model is 130cm tall, weights 54 kg and is able to run at 
6km/h (December 2005). The research began in 1986, achieving a first ASIMO prototype in 
the year 2000. Nowadays, ASIMO is the only robot that is able to autonomously walk 
around and climb stairs and slopes. Furthermore, it is able to understand some human 
gestures and interact with people using its speech recognition system and some pre- 
programmed messages. ASIMO can also push a cart, keeping a fixed distance to it while 
moving and still maintaining the capability to change direction or speed of movement, 
walking hand-in-hand with a person or carrying a tray. 





Fig. 1. Honda ASIMO (left picture), Sony QRIO (middle picture), and ROBONAUT (right 
picture). 

The HRP-2P (Kaneko, 2003) robot specified by AIST (National Institute of Advanced 
Industrial Science and Technology, Japan) and whose hardware was manufactured by the 
Kawada Industries (a company that also worked with Honda and the University of Tokyo 
in the development of the ASIMO and the H6-H7 robots) is one of the most advanced 
humanoids nowadays. It differs from ASIMO on the fact that it is a research prototype 
whose software is open to any roboticist. Moreover, it was designed to walk on uneven 
terrains and recover from falling positions, features not yet possible for ASIMO. It weights 
58kg and is 154cm tall. 

Probably the third robot in importance in Japan is the H7 (Kagami, 2001) from the 
University of Tokyo. However, there is not much information available apart from videos 
showing its capabilities walking on a flat terrain. As above mentioned, Kawada Inc. was 
responsible for the hardware development. It weights 55kg, is 147cm tall and has 30 DoF. 
Sony entered the humanoid world in 1997 with the SDR-1X series, achieving the SDR-4X 
version in 2003, named QRIO (Ischida, 2003) (cf . Fig. 1) as was intended to be commercially 
available. In 2006 Sony announced the decision to stop the further development of the robot. 
QRIO is comparable to ASIMO in its walking capabilities although since it was designed as 
an entertainment robot, its size is substantially smaller than ASIMO: its weights 7kg and is 
58cm tall. Its main features include the ability to adapt its walking to the most difficult 
situations: from walking on irregular or tiled terrains to react to shocks and possible falling 
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conditions. But since its origins as entertainment robot, the most remarkable features are 
those that enhanced its interaction capabilities with people: the robot is able to recognise 
faces, use memory to remember a previously seen person or his/her words, detect the 
person who is speaking and incorporates a vocabulary of more than 20,000 words that 
enables the robot to maintain simple dialogues with humans. 




Fig. 2. The Robot BIN-HUR based on Kondo's KHR-1. 

Hubo (II Woo, 2005) is the most well-known humanoid robot in South Korea and one of the 
world's most advanced. It is the latest development of the series of KHR robots (KHR-1, 
KHR-2 and KHR-3 - Hubo). It is 125cm tall and weights 56kg, having 41 DoF. Apart from 
improving in this latest version its walking abilities, Hubo is now also able to talk to 
someone by using a speech recognition system. 

Fujitsu also entered the humanoid area in 2003 with the HOAP-1 (Murase, 2001). Its major 
claim with it was its learning capabilities and the use of neural networks to control the 
locomotion implementing a Central Pattern Generator (CPG), proven to be one of the 
responsible neural circuitry for the locomotion on vertebrates. These artificial neural 
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oscillators are used to create rhythmic motions to generate the appropriate gait. The major 
advantage is claimed to be its adaptation to the environment and new terrain 
configurations and the minimum computational effort to control the locomotion. No need 
for modelling kinematics, dynamics or generating stable trajectories using complex 
criteria are required. It was intended to be used in research labs and universities as an 
educational tool where to test different algorithms and for that reason provides an open 
source software and weights only 6kg and is 48cm tall. It can walk up to 2km/h and is 
sold at about 50,000€. In 2004, HOAP-2 received the Technical Innovation Award from the 
Robotics Society of Japan. 

Toyota also presented a series of partner robots (2005), one of them walking in two legs, 
finding its application in the elderly care and rehabilitation. As a curious feature, Toyota 
included artificial lips with human finesse what, together with their hands, enables them to 
play trumpets in a similar way a human does. 

WABIAN-RIII and WENDY (Ogura, 2004) are the latest developments from the Waseda 
University, as already mentioned, the pioneers in the humanoid field with the first full-scale 
humanoid robot, a project that began in 1970 and finished three years later with Wabot-1. 
WABIAN-RII continues the research in dynamical walking plus load carrying and the 
addition of emotional gesture while performing tasks. Likewise, WENDY incorporates 
emotional gestures to the manipulation task that is being carried out. WABIAN-RII weights 
130kg and is 188cm tall while WENDY is 150cm tall and weights 170kg. 

Johnnie (Loffler, 2000) is probably the most well-known and advanced humanoid robot in 
Europe. It was developed at the University of Munich with the aim of realising a human- 
like walking, in this case based on the well-established Zero Moment Point (ZMP) 
approach introduced by Honda in the ASIMO robot, but with the aid of a vision system. It 
is able to walk on different terrains and climb over some stairs. It is 180cm tall and 
weights 45kg. 

Robonaut (Ambrose, 2001) (cf. figure 1) is a humanoid robot developed by the NASA with 
the aim of replacing a human astronaut in EVA tasks (outside the vehicle). The main feature 
of the robot is a human dexterous manipulation capability that enables it to perform the 
same tasks an astronaut would perform and with the same dexterity. The robot is not 
autonomous but tele-operated from inside the vehicle. Since legs have no utility in space, 
the robot is composed of two arms and a torso that is attached to a mechanical link enabling 
the positioning of the robot in any required position/orientation. Because of the bulky suits 
the astronauts have to wear to protect against radiations, their manipulation capabilities are 
greatly reduced and the handles, tools and interfaces they use are designed to be handled 
with their special gloves. A robot, even though needing some protection against radiation, 
would not required such a bulky suit thus recovering to a certain extent a human dexterity. 
Moreover, risks for astronauts are avoided on these missions outside the spatial vehicle. It 
has the size of a human torso and arm, with 54 DoF in total: 14 for each hand, 7 for each arm 
and the link to the vehicle, 2 in the neck and 3 on the waist. 

In the field of human-robot interaction, the robot Cog (Brooks, 1998), from the MIT AI 
Lab, is the best example. Cog is composed of a torso, two arms and a head. The main 
focus of this project is to create a platform in order to prove the ideas exposed by 
Rodney Brooks claiming that human-like intelligence appearance requires a human-like 
body that interacts with the world in the same way a human does. Besides, for a robot 
to gain experience in interacting with people it needs to interact with them in the same 
way people do. One underlying hope in Brooks theory is that: Having a human-like 
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shape, humans will more easily perceive the robot as one of them and will interact with 
it in the same way as with other people, providing the robot with valuable information 
for its interaction learning process. These ideas have been taken to the next level with 
Kismet (Breazeal, 1998), which is also a robot from the same Lab. In this case, a robot 
composed of a human-like head. The research focus is on natural communication with 
humans using, among others, facial expressions, body postures, gestures, gaze 
directions and voice. The robot interacts in an expressive face-to-face way with another 
person, showing its emotional state and learning from humans as kind of a parent-child 
relation. 

In the same direction, the Intelligent Robotics Lab of the University of Osaka (Japan) 
developed in 2005 the most human-looking robot so far. It is a female robot named Repliee 
QlExpo (Ishiguru, 2005). Its skin is made of silicon what gives it a more natural appearance 
and integrates a vast number of sensors and actuators to be able to interact with people in a 
very natural and friendly way. Even the chest is moved rhythmically to create the illusion 
that the robot is breathing. 

However, as promising as some of these developments seem to be at first glance, one has to 
carefully evaluate what exactly can be learned for the field of 'embodied cognition' from the 
study of more or less isolated features of human behavior, whether that be in the field of 
complex locomotion, manipulation or interaction. In her paper (Wilson, 2002) identifies six 
viewpoints for the new so-called 'embodied cognition' approach: 

1) Cognition is situated: All Cognitive activity takes part in the context of a real world 
environment. 

2) Cognition is time pressured: How does cognition work under the pressures of real 
time interaction with the environment 

3) Off-loading of cognitive work to the environment: Limits of our information 
processing capabilities demand for off-loading. 

4) The environment is part of the cognitive system: because of dense and continuous 
information flow between the mind and the environment it is not meaningful to 
study just the mind. 

5) Cognition is for action: Cognitive mechanisms (perception/ memory, etc.) must be 
understood in their ultimate contribution to situation-appropriate behavior. 

6) Off-line Cognition is body-based: Even when uncoupled from the environment, the 
activity of the mind is grounded in mechanisms that evolved for interaction with 
the environment. 

We have cited all six viewpoints here, as they represent an interesting perspective on the 
state of the art in embodied cognition. In the experimental work presented here we focus 
our attention on viewpoint 2 that appears to have a crucial instantiation especially in 
humanoid robots that have to find a way to effectively and efficiently counteract the effects 
of gravity while walking. In fact looking at viewpoint 3 and 4 counteracting appears to be 
wrong from the beginning instead having gravity work for the system appears to be a better 
way of achieving robust locomotion in a technical two legged system. 

3. Robust Biped Locomotion Control 

In this section we describe an experiment with a humanoid robot that achieves robust 
locomotion in the absence of a kinematical model. For stable goal directed behavior it relies 
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solely on two simple biological mechanisms that are integrated in an architecture for low 
level locomotion control. 

3.1. The Hardware 

The robot (see figure 2) is based on the Kondo KHR-1 construction kit and has 18 
DOFs, 5 per leg, 3 per arm, and 2 as pan tilt unit for the head. The system is 40 cm 
high (30 cm shoulder-height) and has a total weight of 1.5 Kg. Its mechanics are 
mainly part of the Kondo KHR-1 construction kit. As control unit we use a custom- 
made microcontroller board. 

An ADXL202 tilt sensor was integrated in the upper body to provide information about the 
pitch of the robot. Pressure sensors in the feet indicate if they have ground contact. For 
wireless communication we use a Bluetooth module. The head consist of a CMUCam2 
which is used for color tracking affixed on a pan tilt unit. 

The microcontroller board is composed of an MPC 565 PowerPC microcontroller mounted 
on a custom-designed mainboard. The MPC 565 is running at 40 MHz, has 2 MB flash 
memory and 8 MB RAM. Amongst others it is equipped with three time processing units 
(TPUs) each with 16 channels, two analog digital converter modules (ADCs) each with 64 
channels, and two RS-232 interfaces. 

The mainboard provides 32 Servo plug-in positions which are connected to two of the TPUs 
to generate pulse width modulated (PWM) signals for the activation of the Servos. 
Furthermore, the plug-in positions are connected to the ADCs to feed back the servo's 
current and the actual position on the basis of its potentiometer value. 

Two more plug-in positions each with 8 pins are linked with the ADCs to connect 
additional analog sensors, e.g. tilt sensors or pressure sensors. 

The CMUCam2 is an intelligent camera module which was developed at the Carnegie 
Mellon University. It has the integrated possibility to track colors and to communicate over 
a RS-232 connection. In our case we will use it to track the color of a ball, which will be 
provided as sensory information for a higher level behavior whose intention it is to follow 
the ball 

3.2 The control architecture 

Our architecture is based on two approaches to robust and flexible real world locomotion in 

biological systems, which seem to be contradictory at first sight. These are the Central 

Pattern Generator (CPG) model and the pure reflex driven approach. 

A CPG is able to produce a rhythmic motor pattern even in the complete absence of sensory 

feedback. The general model of a CPG has been identified in nearly every species even 

though the concrete instantiations vary among the species to reflect the individual 

kinematical characteristics in the animals. 

The idea therefore seems to be very promising as a concept to realize locomotion in 

kinematically complex robotic systems, see figure 3. As it resembles the divide and conquer 

strategies that are reflected in nearly all solutions to complex control problems. 

Another model for the support of robust locomotion is also provided by evolution in the 

animal kingdom. This is the concept of reflex based control (Delcomyn, 1980). A reflex can 

be viewed as a closed loop control system with fixed input/ output characteristics. In some 

animals, like the locust, this concept is said to actually perform all of the locomotion control 

and no further levels of control, like the CPG, are involved (Cruse, 1978). 
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Whether or not complex motion control can be achieved only via reflex systems is 
subject to further discussion, however, the concept of a set of fixed wired reactions to 
sensory stimuli is of high interest to roboticists who aim to gain stability in the systems 
locomotion. 




Fig. 3. The low level control architecture. On the global level (light gray area) we have 
implemented Locomotion Behaviors (LB's), typically (Forward, Backward and Lateral 
locomotion). These global behaviors are connected to all local leg controllers and activate 
the local single leg motion behaviors. The local level (dark gray area) implements Rhytmic 
Motion Behaviors (RMB's) and Postural Motion Behaviors (PMB's). These behaviors 
simultaneously influence the amplitude and frequency parameters of -in this case- three 
oscillating networks (OST, OSB and OSD). The oscillators are connected to a common clock 
which is used for local and global synchronization purposes. The oscillators output is a 
rhythmic, alternating flexor and extensor, stimulation signal (see callout box), which is 
translated into PWM signals via the motoric end path. Inline with the output of the motoric 
end path are a set of pertubartion specific reflexes, which override the signals on the end 
path with precompiled activation signals if the sensor information from the physical joints 
meets a set of defined criteria. 

The design of the control architecture described here was thus driven by these two concepts. 
The CPG approach appeared to be interesting to generate rhythmic walking patterns which 
can be implemented computationally efficient, while the reflex driven approach seemed to 
provide a simple way to stabilize these walking patterns by providing: 1) a set of fixed 
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situation-reactions rules to external disturbances and 2) as a way to bias leg coordination 
among multiple independent legs (Cruse, 1978). Figure 3 outlines the general idea. 
This approach features the idea of continuous rhythmic locomotion as well as postural 
activity which is generated by spinal central pattern generators in vertebrate systems 
(Kirchner, 2002), (Spenneberg, 2005). 

For our technical implementation, these activities are solely defined by 3 parameters: 
amplitude, frequency, and offset of the rhythmic movement. Please note the possibility to 
set amplitude and frequency to zero, just modifying the offset parameter, which would 
result in linear, directly controlled joint movements. In those cases where amplitude and 
frequency have non-zero values, the activation patterns will result in a rhythmic movement 
of the joint around the offset (or baseline) with given frequency and amplitude. To produce 
complex locomotion patterns, like forward, left, right, or backward movements, all joints of 
the robot have to be activated simultaneously, while some (legs, shoulder, and hip) actually 
produce rhythmic activities, others, (like neck, elbow, etc.) will have their amplitude and 
frequency values set to zero maintaining a position at the offset value. One important aspect 
of central pattern generators is their nature as feedback control loops, here the so-called 
proprioceptive information is fed back into the controller and modifies its activity. 

4. Implementation Issues 

Our implementation of the low-level control architecture, shown in figure 5, consists of a 
combination of drivers and behaviors, which are connected thru special functions (merge 
functions). Our concept for locomotion is a combination of balance control, see figure 4 and 
posture behaviors, which should keep the robot balanced while walking or during external 
interferences. Central Pattern Generator (CPG) behaviors are used to produce rhythmic 
motions for walking. The speed at which the rhythmic motions are performed is defined by 
a global clock. 
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Fig. 4. The control cycle for balance control. 

A higher level behavior 'walking' has the task to implement directional walking and 
another high level behavior 'ball-following' will use the sensory information of the 
CMUCam2 to follow the intention to track a ball by giving instructions to the walking 
behavior. 



5. Experimental Results 

To demonstrate a possible result for the activation pattern of a joint while overlaying 
different CPGs and modifying the posture, we first let the robot walk hanging in the air 
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without any balancing behavior in order to get even curves. We could not let the robot 
walk on the ground without balance behaviors because then the system braces and 
topples down. 
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Fig. 5. Implementation of the low-level control concept on the humanoid robot. 
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Figure 6 shows the desired angles for both hip forward joints at a pulse of 2000 

milliseconds. The right and left legs curve are shifted by half the period because one leg is in 

the swing phase while the other one is in the stance phase. 

The rhythm from ms to 15000 ms is only generated by the forward CPG, however, the 

offset value is set to 10 degrees from 5000 ms to 10000 ms and combined via add merge, 

thus resulting in a more ducked posture while walking. 

From 15000 ms to 21000 ms the forward and turn left CPGs are active and mixed together 

with an average merge which has the effect that the robot takes a moderate left curve. 

In the time segment from 21000 ms to 27000 ms just the CPG for turning left is active and 

after that there is only a basic posture value. 
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Fig. 6. Reference angle for right and left hip forward joint while walking hanging in the air 
without balance behaviors. 



5.1. Walking with balance 

When the robot walks on the ground with active balance behaviors, the desired joint 

angles of the balance behaviors are added to the values of the posture behavior and the 

active CPGs. In this experiment, we let the robot also walk with a pulse of 2000 

milliseconds. 

First we let it walk on the ground without active balance behavior to show the desired and 

real angles of the leg joints with resistance of the gravity and the robot's weight. During this 

run, shown in figure 9, we prevented the robot from toppling down by hand. From ms to 

6000 ms the robot walked forward, and then took a moderate left curve till 12000 ms, and 

after that it turns left on the spot. 

In the second trial, shown in figure 10, we activated the balance behavior. The robot walked 

forward from ms to 4000 ms, followed by a moderate left curve till 12000 ms, and then 

turned left on the spot. 

As you can see in figure 10, the activation of the balance behavior results in noisier curves 

than just walking forward without balance behaviors like in figure 9 but it stabilizes the 

system and prevents it from bracing. 

The balance behavior which is designed as a PID-Controller takes the tilt value shown in 

figure 7 as input for the controller and writes the controller's output values multiplied with 

a specific factor for each joint to the servos. Negative sensor values represent a right or 

rather rear leaning, and positive values a left and accordingly front leaning. 
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As you can see, the output of the PID-Controller shown in figure 8 is less noisy than the tilt 
sensor values and seems to be more rhythmic. The pattern is repeated every 2000 
milliseconds which shows that frequency of the interferences and the retaliatory action 
depend on the pulse. 
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Fig. 7. Values for rear-front and right-left tilt, while walking on the ground with active 
balance behavior. 
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Fig. 8. Calculated error from the balance behavior's PID-Controller while walking on the 
ground with active balance behavior. 



5.2. Reaction on a lateral hit 

To test the static stability we used the following experimental setup. A ball with a weight of 
250 grams was fixed as a pendulum over the robot. The band it is attached with has a length 
of 15 cm. Then we let the ball fall from a height of 15 cm 5 times and hit the robot at the 
right shoulder. If we do not activate the balance behaviors the robot cannot absorb the hit 
just by his stable standing resulting from the posture behavior and topples down. With 
active balance behaviors the robot tries to react against the hit because of the difference 
between the desired and the actual leaning and is able to stay and adjust his balance. 
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Figure 12 shows the desired and real angle of the left arm and leg joints resulting from the 
reaction of the balance behavior as an average of the 5 recurrences. 

Figure 13 shows the perception of the hit by the tilt sensor whose values are used as input 
for the balance behavior's PID-Controller, shown in figure 14. 
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Fig. 9. Desired and real angle (degree) from left leg joints while walking on the ground 
without active balance behavior over 20000 ms. The robot was prevented from toppling 
down by hand. 
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Fig. 10. Desired and real angle (degree) from left leg joints while walking on the ground 
with active balance behavior over 20000 ms. 
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Fig. 11. Snapshots of the reaction on a lateral hit in 200ms intervals. 



6. Discussions and future work 

The CPG based approach for combining rhythmic movements in two-legged robotic 
systems does work and produces less calculating costs than inverse kinematics (Tevatia, 
2000), it results in smoother movements than using simple look up tables, and is easier to 
realize than neural networks (Shan, 2002). However, direct, goal directed movements are 
difficult to implement and still require kinematic models of the system. 

Recently, we are working on a biologically inspired hybrid learning architecture, see figure 
15 for embodied cognition supporting recognition and representation on the basis of 
sensorimotor coordination. The notion of hybrid architectures is straightforward in the 
literature, born from the understanding that neither reactive nor deliberative systems 
provide a sufficient basis for truly cognitive agents. It is therefore natural to postulate 
systems that contain both types of systems. An important question in these architectures is 
where to draw the line between the reactive and the deliberative component, and what their 
relationship should be. Regarding this important question, the architecture we are working 
on meaningfully combines a reactive layer and a higher level deliberative layer. The reactive 
layer will be responsible for attention control, object categorization, and reflex triggering. 
The locomotion approach we have developed for the humanoid robot will be integrated in 
the reactive layer and will be exploited in approaching and manipulating objects. The 
deliberative layer will provide a means for the robot to learn and adapt to new 
environments. The ball following behavior that we want to implement, for example, can be 
implemented in the deliberative layer, where behaviors running in the reactive layer can be 
modulated and combined to achieve the required behavior 
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Fig. 12. Desired and real angle (degree) from left arm and leg joints over 4000 ms as an 
average over 5 recurrences. 
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Fig. 13 Tilt values for rear-front and right-left pitch as an average over 5 recurrences. 
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Fig. 14. Calculated error from the balance behavior's PID-controller as an average over 5 
recurrences. 
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Fig. 15. An architecture integrating learning, representation and robust low-level control. 
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1. Introduction 



Within the last decade, robotic research has turned more and more towards flexible 
assistance and service applications. Especially when cooperating with untrained persons at 
small distances in the same workspace, it is essential for the robot to have a deep 
understanding and a reliable hypothesis of the intentions, activities and movements of the 
human interaction partner. 

With growing computational capacities and new emerging sensor technologies, methods 
for tracking of articulated motion have become a hot topic of research. Tracking of the 
human body pose (often also referred to as Human Motion Capture) without invasive 
measurement techniques like attaching markers or accelerometers and gyroscopes 
demands (1) for algorithms that maximally exploit sensor data to resolve ambiguities that 
compulsorily arise in tracking of a high-degree-of-freedom system, and (2) for strong 
models of the tracked body that constrain the search space enough to enable fast and 
online tracking. 

This chapter proposes a 3D model for tracking of the human body, along with an iterative 
tracking approach. The body model is composed of rigid geometric limb models, and joint 
models based on an elastic band approach. The joint model allows for different joint types 
with different numbers of degrees of freedom. Stiffness and adhesion can be controlled via 
joint parameters. 

Effectiveness and efficiency of these models are demonstrated by applying them within an 
Iterative Closest Point (ICP) approach for tracking of the human body pose. Used sensors 
include a Time-of-Flight camera (depth camera), a mono colour camera as well as a laser 
range finder. Model and sensor information are integrated within the same tracking step for 
optimal pose estimation, and the resulting fusion process is explained, along with the used 
sensor model. The presented tracking system runs online at 20-25 frames per second on a 
standard PC. 

We first describe related work and approaches, which partially form the basis for 
the presented models and methods. Then, a brief introduction into the ICP is given. 
The model for body limbs and joints is explained in detail, followed by a 
description of the full tracking algorithm. Experiments, examples and different 
evaluations are given. The chapter closes with a discussion of the achieved results 
and a conclusion. 
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2. Related work 

Tracking of human body motion is a highly active field in current research. Depending on 
the target application, many different sensors and models have been used. This includes 
invasive sensors like magnetic field trackers (Ehrenmann et al., 2003; Calinon & Billard, 
2005) that are fixed to the human body. Within the context of human robot interaction in 
every-day life, this approach is not feasible; non-invasive tracking approaches must be 
applied. Most of these are based on vision systems, or on multi-sensor fusion (Fritsch et al., 
2003). Systems which rely on distributed sensors (Deutscher et al., 2000) are not practicable 
in the given domain; the tracking system must be able to rely only on sensors mounted on 
the robot. 

Several surveys exist on the area of tracking humans (Aggarwal & Cai, 1999; Gavrila, 1999; 
Moeslund & Granum, 2001; Wang et al., 2003). Possible applications range from the 
mentioned human-robot interaction to surveillance and security domains. Hence, there is a 
big variety of methods ranging from simple 2d approaches such as skin colour segmentation 
(Fritsch et al., 2002) or background subtraction techniques (Bobick & Davis, 2001) up to 
complex reconstructions of the human body pose. (Ramanan & Forsyth, 2003) shows how to 
learn the appearance of a human using texture and colour. 

Sidenbladh (Sidenbladh, 2001) used a particle filter to estimate the 3d pose in monocular 
images. Each particle represents a specific configuration of the pose which is projected into 
the image and compared with the extracted features. (Cheung et al., 2003) use a shape-from- 
silhouette approach to estimate the human's pose. 

A similar particle filtering approach is used in (Azad et al., 2004). The whole body is tracked 
based on edge detection, with only one camera. The input video stream is captured with 
60Hz, which implies only small changes of the configuration between two consecutive 
frames. As it is a 2d approach, ambiguities of the 3d posture can hardly be resolved. 
An ICP-based approached for pose estimation is shown in (Demirdjian & Darrell, 2002). The 
authors use cylinders to model each body part. In (Demirdjian, 2003) the same authors show 
how they model joint constraints for their tracking process. However, it the effect of the ICP 
is partially removed when the constraints are enforced. Nevertheless, parts of the work 
described in this chapter are based on the work of Demirdjian. 

3. Sensors and framework 

For tracking of the human in a human-robot interaction context, only the sensors onboard 
the robot can be used. In our setup, we use several different sensors as input for the tracking 
algorithm, which fuses all available information to obtain an optimal estimation of the 
current pose of the human. 

3.1 Sensors 

3D point clouds are acquired by a Time-of-Flight camera. This depth camera called 
Swissranger (CSEM, 2006) has a resolution of 160 x 124 pixels and a depth range of 0.5 to 7.5 
meters. Fig. 1 shows the depth image of an example scene. Alternatively, point clouds from 
reconstruction of stereo images can be used. 

A standard single FireWire camera is used to obtain colour images. These are processed by a 
standard skin-colour based algorithm to track head and hands in the image. All image 
regions which are candidates for skin regions are provided to the tracking algorithm. 
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A SICK laser range finder which is mounted for navigation purposes delivers 3D points in 
its measurement plane. These points are also provided to the tracking. 




Fig. 1 Depth image, retrieved with the Time-of-Flight sensor. The point cloud is visualized 
in 3D, and depth is additionally encoded in colours. Green is equivalent to near, and blue to 
distant measurements. 

3.2 Iterative Closest Point algorithm 

This section gives a short introduction to the Iterative Closest Point (ICP) algorithm. The 
goal of the ICP is to match two indexed sets of the same points which are given in different 
coordinate systems and calculate the translation t and rotation R that transform the first 
coordinate system into the second. 

For tracking, the first set corresponds to the data points of the sensor and the second set 
corresponds to points on the surface of a rigid body. Following (Besl & McKay, 1992), the 



first set is denoted P 

N r =N 



■ {p t } , the second one X - {x, } . Both sets have the same size with 



,, — N and each point p ; corresponds to point X t . 



With six degrees of freedom, at least three points are necessary to compute translation and 
rotation. Because sensor data is always corrupted with noise, no exact solution exists; 
instead, the problem is solved as a minimization problem for the sum of squared distances: 
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In equation (3), the first part is independent from t ' , the second part reveals to zero. 
Therefore the function becomes minimal if t '= . Transformation yields 

t=P p -R(P x ) (4) 

Having the optimal translation (giving t '= ), equation (2) becomes: 

^ N 2 

/(*,?)=—£ !*(*■, w,! (5) 

Considering ||i?(x,-')|| = \\xA\, the equation can be rewritten as 

1 N N l 
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Gives the optimal rotation. See (Horn, 1987) for details. 

The described method can be used to compute the translation and rotation for two point sets 
P and TV , where each p i corresponds to x, . These point sets and their association has to 
be set up from the measurements and the surface model of the tracked body. The first set 
can be directly derived from the measured 3D point cloud. N is constructed by calculating 
for each data point p i the closest point on the model surface, giving x, . This can be done with 
geometrical relations. Having these two indexed point sets, the optimal rotation and 
translation can be computed and applied to the model. 

Association of model surface points to measurements can only be estimated. This estimation 
usually results in a non-optimal transformation, and the algorithm is iteratively applied 
until a minimum is reached. In summary, the Iterative Closest Point (ICP) algorithm works as 
follows: 

1. For the given mode position and the given measurement M , calculate the closest 
points on the model giving CPq . 

2. Calculate the sum of squared distances between data points and model points, 
giving d (M,CP )- 

3. Estimate rotation and translation and apply to the model. 

4. Calculate the new set of closest points with the new model position, giving CP t . 

5. Calculate the sum of squared distances between data points and model points 
giving d^M^CP,). 

6. If d t _ x (M, CP t _x ) ~ d, (M, CP t )<£ , stop iteration. Otherwise go to step 3. 

Note that the computation of the closest points on the model is the most time consuming 
step in the ICP loop, since it includes geometrical calculations for each data point in the 
point cloud. 

The ICP as mentioned above can only cope with one rigid body as model. For extension of 
the method to articulated models, the tracked body has to be modelled with a set of rigid 
bodies, which are connected to one articulated model. Two problems have to be solved in 
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this context: (i) data points must be associated with body parts, before they can be used for 

tracking, because each body part is transformed separately by the ICP. (ii) A mechanism 

must be established which introduces the joint constraints of the model in the tracking 

process, to avoid drift of the limbs due to separate tracking of each part. 

The first issue is solved by checking each measurement point with all body part models, and 

assigning it to the geometrically closest body part. In addition, a threshold is used to discard 

measurements which have too high distance to the model. 

To avoid that the body parts drift apart, we introduce a novel joint model. The complete 

body model together with the joint model will be explained in the next section. 

4. Proposed body model 

For the tracking system a 3d body model is used. Each body part is represented with a 
degenerated cylinder. The top and the bottom of each cylinder are described by an ellipse. The 
ellipses are not rotated to each other and the planes are parallel. In total such a body is 
described by five parameters: major and minor axis of each ellipse, plus the length of the 
cylinder. 

The overall body model is built in a tree-like hierarchy starting with the torso as root body 
part. Each child is described with a degenerated cylinder and the corresponding 
transformation from its parent. Up to now the body model consists of ten body parts (torso, 
head, two for each arm and two for each leg) which is depicted in Fig. 2. It should be 
mentioned that this body model is not necessarily restricted to humans, and also other 
bodies can be modelled easily. 




Fig. 2. Hierarchical body model, consisting of 10 cylinders and 9 joints. The torso describes 
the root limb. 

If the fusion algorithm also incorporates data from feature trackers (like some vision based 
algorithms, or magnetic field trackers that are fixed on the human body), it is required to 
identify certain feature points on the human body. This is done following the H | Anim 
Specification (H | Anim, 2003). 



4.1 Joint constraint model 

The joint model we propose is based on the concept of introducing elastic bands into the 
body model. These elastic bands represent the joint constraints. For the ICP algorithm, the 
elastic bands can be modelled easily as artificial correspondences and will thus be 
considered automatically in each computation step. 
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For each junction of model parts, a set of elastic bands is defined (see Fig. 3). These relations 
set up corresponding points on both model parts. The corresponding points can then be 
used within the model fitting process to adjust the model configuration according to any 
sensor data input and to the defined constraints. 




Fig. 3. Different joint types. Universal joint with 3 degrees of freedom (a), hinge joint with 
one full and 2 restricted degrees of freedom (b), and (c) elliptic joint with 3 restricted degrees 
of freedom. 



4.2 Joint Constraint Types 

With this approach, different types of joints can be modelled. Looking at a model for the 
human body, different kinds of joints with varying degrees of freedom are required: 

• Universal Joints have 3 full degrees of freedom. This joint type can be found e.g. in 
the shoulder. The upper arm can rotate up/ down, forward/ backward and around 
its main axis. Universal joints are modelled by a point-to-point correspondence 
(one elastic band) between both body parts with one point on each, see Fig. 3 a). 

• Hinge Joints have one real degree of freedom, the others being almost fixed. This 
can be found e.g. in the human knee or elbow (only 1 DoF), or in the hip (1 real 
DoF, the other two existing, but highly restricted in motion). Hinge joints are 
modelled by a set of correspondences which are distributed along a straight line on 
both body parts. The same restriction can be achieved with correspondences only 
at each end of the line (two elastic bands), see Fig. 3 b). 

• Elliptic Joints have all degrees of freedom highly restricted. An example on the 
human body is the neck (or the wrist): Motion is possible in all 3 degrees of 
freedom: Left/ right, forward/ backward, and turning. Each direction is very 
limited in range. Elliptic joints are modelled by a set of correspondences 
distributed along an ellipse on both body parts. This restriction can be achieved 
with correspondences on each end of the main axes of the ellipse (four elastic 
bands), see Fig. 3 c). 

Universal and hinge joints are special cases of the elliptic joint. For the hinge joint, one major 
axis of the ellipse is set to zero, resulting in a straight line. Setting both axes to zero produces 
a universal joint, because all correspondences are reduced to one point-to-point relation. 
Following these definitions, each joint is modelled with a set of parameters describing the 
type of joint and its behaviour. This parameter set consists of the major axes of the ellipse, its 
position and orientation on both body parts, and the weight of the given correspondence. 
These parameters and the resulting behaviours are now described in detail. 
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Major axes: The model type (universal, hinge, elliptic) and the valid range of each degree of 

freedom control the choice for the major axes sizes and ratio. Universal joints are modelled 

with both ellipse axes set to zero. For hinge and elliptic joints, the axis direction defines the 

rotation axis, and the axis length defines the stiffness of the other two rotational degrees of 

freedom. 

In Fig. 3 b), rotational flexibility around the z-axis (perpendicular to the image plane) and 

around the symmetric axis of the cylinders is very limited due to the modelled joint. 

Position and orientation: Position and orientation of the point-to-point, hinge or elliptic 

joint model with respect to both body parts define the connection between both parts. 

Weight: When the joint model is used within a bigger tracking framework the elastic bands 

can be used as correspondences which are included as tracking constraints. The use of 

measured correspondences together with artificial ones puts up the need for correct 

weighting strategies between input and model constraints. 

To incorporate this, each joint model can be weighted with respect to measured input. This 

parameter is then used within the model-fitting algorithm to balance between measured 

input and joint constraints. The weight parameter is defined in relation to the number of 

'natural' correspondences to keep the ratio between measurements and constraints. 

To increase the weight of a joint model tightens the coupling between both model parts, by 

decreasing the coupling becomes looser. For hinge and elliptic joints, higher weight also 

increases stiffness of a kinematic chain. This makes sense especially for joints like the human 

neck or wrist which shows a very tight coupling and very limited angular range. 

The proposed joint model provides a "soft" way to restrict the degrees of freedom for the 

model parts. It additionally provides means to control the "degree of restrictiveness" for 

each DoF in a joint. Applying elastic strips is tantamount to introducing a set of forces which 

hold the model parts together. The connected points and magnitude define the joint 

behaviour. 

The soft joint model can e.g. be applied to the joint between human pelvis and thigh: While 

in forward/ backward direction the movement is almost unrestricted, there is a high 

restriction for the left/ right movement. But still a small movement is possible, and a 1-DoF 

model would not be sufficient. 

Nevertheless, it is possible to model plain bending joints with a hinge joint model with one 

large axis. 

4.3 Joint Model within the ICP 

One of the main advantages of this joint model is that it can be very easily integrated in 
tracking algorithms. The joint model is added as a second data source, which adds 
correspondences between model and real world. 

Introducing the joint model correspondences in the ICP framework (see sec. 3.2) is done by 
transforming the elastic band constraints into artificial input points according to the 
following rules: 

• For each correspondence with the weight W, W artificial point pairs are 
generated. 

• The artificial point pairs have to be added to the correspondence list after 
computation of the Closest Point Relations. 

• Because each body part is processed separately, each joint model has to be added 
twice, once to each associated body part. 
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• The generated point pairs each represent one point on the model and the 
associated artificial data point. So each pair has to be added to one body part as 
Model - Data and to the other Data - Model relation to retrieve the desired forces 
(from the elastic bands) on both model parts. These forces then try to establish the 
modelled connections. 

• The artificial correspondences are recalculated in each ICP step. 

• The chosen weight of each joint depends on the desired stiffness of the model. To 
always achieve the same stiffness during tracking, the ratio between measured and 
artificial point relations has to be constant. This means that the number of 
generated artificial points for one body part in each step depends on the quantity of 
measurements for this part of the model. The generated relations are linearly scaled 
with the number of measurements. 

• From our experience, the ratio r between measured and artificial points should be 
chosen as approximately 0.4 < r < 0.7. This gives enough cohesion within the model 
without implying to hard and static relationships. 

It is important to note that the introduction of multiple identical correspondences within the 
ICP does not increase computation time with the order of point counts (like a set of different 
measured points would). The only additional effort consists of one multiplication of the 
resulting point matrix (4x4) with the scalar weight W. 

5. Data fusion for tracking 

The goal of the tracking system is to track the posture of a human body in 3d by matching 
the internal 3d body model with the current input sensor data. Thus, the tracking system 
offers three interfaces: sensor data stream (input), parameter configuration (input), and 
current posture estimation (output). All sensor data formats that can be exploited are 
described in sec. 5.1. The configuration values we have identified will be described in sec. 
5.2 along with the processing steps. 

The current posture estimation output is given with respect to the hierarchical body model 
defined in sec. 4. In each time step, the whole body model is provided. This allows for 
changes not only in the body pose (joint angle space), but also for changes in the model itself 
(configuration and parameters of the body model). This may concern scaling of the model 
for different persons with varying body heights, or even addition and deletion of body parts 
in case of changing tracking targets or other effects. This can be useful e.g. if the tracked 
person is holding and handling a big object, which then can be added easily to the tracked 
configuration. 

The fusion algorithm, which is implemented in a tracking system called VooDoo, is depicted 
in Fig. 4. The next section describes possible input data, while sec. 5.2 depicts the processing 
steps within the tracking loop. 

5.1 Input data 

The proposed tracking algorithm is able to include, process and fuse different kinds of 
sensor data (see also Fig. 4): 

• Free 3d points from ToF-sensors or from pure stereo depth images. The system has 
to decide whether to use these points as measurements of the tracked model. For a 
point that is not discarded, the corresponding point on the model surface is 
computed. 
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3d points on the human body that are e.g. generated by a stereo vision system that 

tracks a person in image space and generates the corresponding 3d points by stereo 

reconstruction. 

3d points assigned to a single body part may also be generated by a stereo vision 

system tracking special body parts like the face or the hands. 

3d point-to-point relations are 3d points that can be assigned to a given point on the 

tracked human body. Thus, tracking of special features or points (e.g. with 

markers, or magnetic field trackers attached to the human body) can be integrated. 

2d point-to-line relations can e.g. be derived from a 2d image space based tracker. 

The pixel in the image plane together with the focal point define a ray in 3d, which 

corresponds to the point on the human body that has been detected in the image. 
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Fig. 4. The complete tracking algorithm, containing all input data streams. "BB" denotes 
Bounding Box. 

This data can originate from any sensor that gives data in the described format. Obviously, 
all input data has to be transformed into the tracker coordinate system before it is used 
within the system. In our setup, we use the 3d point clouds from laser range finder and 
depth camera as free 3d points. Extracted features (hands, head) from the colour image are 
projected onto 2d point-to-line relations. 



5.2 Processing 

For the ICP matching algorithm, a list of corresponding point pairs has to be set up for each 
limb. Therefore, all "free" 3d points have to be analyzed in order to decide whether they 
correspond to points on the tracked model. Otherwise, they are discarded. Additionally, all 
given correspondences from other tracking procedures and the background knowledge on 
joint constraints have to be added to the correspondences list. Then, the optimal resulting 
model configuration has to be computed. These steps are performed iteratively until an 
optimum of the configuration is reached. 

Before the input data of one time step is processed, it is possible to adjust internal model 
parameters. This can be e.g. the model scale factor, or particular cylinder sizes. Even limbs 
can be added to or removed from the model. 
The processing steps are now described in detail. 
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Prefiltering free 3d points: The whole point cloud of free 3d points from used depth sensors is 
processed in order to remove all points that are not contained within the bounding box of 
the body model (see Fig. 4, step BB Check whole body). This is done on the assumption that 
the body configuration changes only locally between two time frames. A parameter defines 
an additional enlargement of the bounding box prior to this filtering step. The resulting 
point list is concatenated with any sensor data input that has already assigned its measured 
3d points with the tracked. It results in a list of 3d points which are close to the body model 
and thus are candidates for measurements of the tracked body. 

Assigning points to limb models: The point list is now processed in order to assign measured 
points to dedicated limb models based on the bounding box of each limb model (see Fig. 4, 
step BB Check body parts). Again, the bounding boxes can be enlarged by a parameter to take 
the maximum possible displacement into account. Points that do not fall in any bounding 
box are again removed. Several behaviours can be selected for points that belong to more 
than one bounding box (overlap): These points are either shared between limb models, 
exclusively assigned to one limb or shared only in case of adjacent limbs. This last method 
avoids collisions between limbs that are not directly connected. The resulting point list can 
be joined with any sensor data input that has already assigned its measured 3d points with 
dedicated limbs of the tracked body. The resulting point list contains candidates for 
measurements of each limb. 

Point Number reduction: The resulting point list can be downsampled before the calculation 
of the closest points to reduce the overall number of points (see Fig. 4, step Downsampling). 
This step is controlled by three parameters: the sampling factor, and minimum and 
maximum number of points per limb. Thus, it is possible to reduce the number of points for 
limbs with many measurements, but maintain all points for limbs which have been 
measured with only a few points. 

Closest point computation: The closest point calculation is the most time-consuming step in the 
whole loop. For each remaining data point, the corresponding model point on the assigned 
limb model has to be computed for the ICP matching step (see Fig. 4, step Closest Point). This 
involves several geometric operations. Depending on the resulting distance between data 
and model point, all points within a given maximum distance are kept and the 
correspondence pair is stored in the output list. All other points are deleted. 
3d point-to-point relations from input data can now be added to the resulting list, which 
holds now corresponding point pairs between data set and model. 

Addition of 2d measurements: Each 2d measure (e.g. tracked features in 2d image plane of a 
camera) of a feature on the human body defines a ray in 3d which contains the tracked 
feature. This fact is used to add the 2d tracking information to the 3d point correspondences 
(see Fig. 4, step Closest point on line): For each reference point on the body model, the closest 
point on the straight line is computed and added to the list. 

Joint model integration: The joint model for each junction is added as artificial point 
correspondences for each limb, depending on the limb type (see Fig. 4, step Joint model). 
According to sec. 4.1, the correspondences can be interpreted as elastic bands which apply 
dedicated forces to the limbs to maintain the model constraints. Thus, artificial 
correspondences will keep up the joint constraints in the fitting step. 

Model fitting: When the complete list of corresponding point pairs has been set up, the 
optimal transformation between model and data point set can be computed according to 
sec. 3.2 (Fig. 4, step Least squares). The transformation is computed separately for each 
limb. 
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When all transformations have been computed, they can be applied to the model. The 
quality measure defined in sec. 3.2 is used for the fitting. All are repeated until the 
quality measure is below a given threshold or a maximum number of steps has been 
performed. 

6. Sensor model 

Each used data source has its own stochastic parameters which have to be taken into 
account. The described approach offers a very simple method for this: each input date is 
weighted with a measure that describes its accuracy. The ICP algorithm then incorporates 
these weights in the model-fitting step. Thus it is possible to weight a 2d face tracker much 
higher than a single 3d point from a Time-of-Flight camera, or to weight 3d points from a 
Time-of-Flight-camera slightly higher than points from the stereo reconstruction due to the 
measuring principle and the sensor accuracy. 

It is important to note that an increased weight for a single point does not affect the time 
needed for the computation. This is very important and is due to the fact that in the 
presented approach, each measurement is projected into model space. This is different to 
e.g. particle filtering approaches, where each particle is projected into each sensor's 
measurement space to compute the likelihood. In consequence, adding a sensor source to 
the tracking framework increases computation time only with the number of different 
measurements from the sensor. 

An example configuration can be seen in Fig. 5. The model consists of two cylinders, 
connected by a linear joint. The measurements contain a 3d point cloud, and a 3d 
measurement of one end point. This configuration can e.g. result from a stereo depth image 
of a human arm and a colour based hand tracker. 




Fig. 5. Different weights for measurements from different sensors, projected into 3D model 
space. The depicted point sizes correspond to the sensor data weight, the lines indicate the 
closest-point relations. These pictures motivated the system name VooDoo. 

7. Experiments, evaluation and results 

The described tracking procedure has been implemented and tested with a Time-of-Flight 
camera and a stereo camera. The tracking runs online at a frame-rate of appr. 20-25Hz on a 
Pentium4 with 3.2GHz with a model of a human body, consisting of 10 cylinders with 9 
joints. For the experiments, the same data sequences have been processed using different 
input sensor configurations to test the fusion. 

Fig. 6 shows example images from a sequence of 15 seconds containing a "bow" and a 
"wave" movement. The first row shows the scene image, which has been also used for 
segmentation of face and hands. The second and third row contain the tracking result 
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with 3d data only (row 2) and 2d data only (row 3), where the 3d data has been 

acquired with the Time-of-Flight camera and the 2d data is derived from skin colour 

segmentation in one image of the stereo camera. The rays in 3d defined by the skin 

colour features can be seen here. Row 4 shows the tracking result with both inputs 

used. 

For the shown results, the following weights for the input data have been used: 3d data 

points w = 1.0, face tracker w = 30.0, hand tracker w = 20.0. 
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Fig. 6. Experiments with different sensor inputs, taken from a sequence containing a "bow" 
and a "wave" movement. The frame number is displayed on the top. The used 2d and 3d 
correspondences have been added to the resulting model images. 

Different conclusions can be drawn from the results: 

• Huge movements are easily detected by the 3d data based tracking: The "bow" 
movement is tracked quite well. On the other hand, fast movements with the 
extremities may cause failures when only 3d data is used, as with the "wave" 
movement. 

• Tracking only with a 2d feature tracker works quite well for the tracked body parts. 
Nevertheless, the body configuration cannot be determined only from 2d features 
(see frame 81). To do this, a lot more background information on the human body 
would be needed. 

• Fusion of both input sensors in 3d shows very good results: Huge body 
movements as well as fine and fast movements of the extremities can be 
recognized, and the algorithm is able to reliably track the body 
configuration. 

As already stated in sec. 6, the computational effort and thus the frame-rate depends on the 
true number of different measurements, independent of the particular weights. To evaluate 
the computational performance and frame-rate of the presented method, several analyses 
have been carried out. The model corresponds again to the human body model used for the 
analysis in Fig. 6. 
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Fig. 7. Number of ICP steps required for a typical tracking sequence. 
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Fig. 8. Time consumption per ICP step vs. number of ICP steps. 

The computational effort for one frame depends first of all on the number of ICP steps 
needed. The number of iterations again depends on the body displacement between two 
consecutive frames. Fig. 7 shows the number of required ICP steps during a typical tracking 
sequence for a human body model. During phases without large movements, one iteration 
is enough to approximate the body pose (frame 500 to 570). Extensive movements are 
compensated by more ICP iteration steps per frame (650 to 800). 

The required time per frame obviously increases with the number of needed ICP steps. This 
relation is shown in Fig. 8. A maximum number of 6 ICP steps has turned out to be a good 
trade-off between time consumption per frame and tracking accuracy. This leads to a frame 
period of 20 - 70ms, which corresponds to a frame-rate of 14.2 to 50Hz. The maximum 
frame-rate in our framework is only constrained by the camera frame-rate, which is 30Hz. 
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Fig. 9. Time consumption per frame vs. number of body measurements. 

The relation between the number of body measurements and the computational effort for 
one ICP step is depicted in Fig. 9. For each measurement of the target, several computations 
have to be carried out. This leads to the dependency in Fig. 9. As expected, the time scales 
linearly with the number of measurements. 

These results show that the presented tracking approach is able to incorporate several 
thousand measurements with reasonable computational effort. One disadvantage of the 
depicted iterative process is the negative dependency between target displacement and 
computational effort: The faster the target moves, the longer the tracking needs for one 
frame, which again leads to larger displacements due to the low frame-rate. 
To overcome this, one has to find a good trade-off between accuracy and frame-rate. This 
compromise depends on the tracking target characteristics, as well as on the application 
which utilizes the Human Motion Capture data. It is also possible to switch between 
different behaviours, taking into account the requirements the applications which depend 
on the Motion Capture data: in case the data is used for physical interaction (e.g. handing 
over objects), the required accuracy is high, along with usually low dynamics. On the other 
hand, if the target is only to observe a human in the robot's environment, the required 
accuracy is low, but the person moves with high velocity. 



8. Discussion and conclusion 

This paper has proposed a geometric human body model, a joint model and a way for 
fusion of different input cues for tracking of an articulated body. The proposed algorithm is 
able to process 3d as well as 2d input data from different sensors like ToF-cameras, stereo or 
monocular images. It is based on a 3d body model which consists of a set of degenerated 
cylinders, which are connected by an elastic bands joint model. The proposed approach runs 
in real-time. It has been demonstrated with a human body model for pose tracking. 
The main novelty and contribution of the presented approach lies in the articulated body 
model based on elastic bands with soft stiffness constraints, and in the notion of point 
correspondences as a general measurement and model format. Different joint behaviours 
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can be modelled easily by distributing the elastic bands along two axes in the joint. The joint 
constraints are incorporated in the ICP as artificial measurements, so measurements and 
model knowledge are processed identically. The model can also be refined by adding 
cylindrical primitives for hands, fingers and feet. This is reasonable if the accuracy and 
resolution of the available sensors are high enough to resolve e.g. the hand posture, which is 
not the case in our approach due to the large distance between human and robot and the 
low measurement resolution. 

The idea of introducing artificial correspondences into the fitting step can even be exploited 
further. Current works include further restriction of the joints in angular space by adding 
angular limits to certain degrees of freedom, which are maintained valid by artificial point 
correspondences. These will be generated and weighted depending on the current body 
configuration. 

Our implementation of the described tracking framework has been released under the GPL 
license, and is available online at wwwiaim.ira.uka.de/users/knoop/VooDoo/doc/html/, 
along with sample sequences of raw sensor data and resulting model sequences. 
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Drum Beating and a Martial Art Bojutsu 
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1. Introduction 

Over the past few decades a considerable number of studies have been made on impact 
dynamics. Zheng and Hemami discussed a mathematical model of a robot that collides with an 
environment (Zheng & Hemami, 1985). When a robot arm fixed on the ground collides with a 
hard environment, the transition from the free space to constrained space may bring instability 
in the control system. Therefore, the impact between robots and environments has been the 
subject of controversy. Asada and Ogawa analyzed the dynamics of a robot arm interacting 
with an environment using the inverse inertia matrices (Asada & Ogawa, 1987). In the early 
90's, the optimum approach velocity for force-controlled contact has been enthusiastically 
studied (Nagata et al., 1990, Kitagaki & Uchiyama, 1992). Volpe and Khosla proposed an impact 
control scheme for stable hard-on-hard contact of a robot arm with an environment (Volpe & 
Khosla, 1993). Mills and Lokhorst proposed a discontinuous control approach for the tasks that 
require robot arms to make a transition from non-contact motion to contact motion, and from 
contact motion to non-contact motion (Mills & Lokhorst, 1993). Walker proposed measures 
named the dynamic impact measure and the generalized impact measure to evaluate the effects 
of impact on robot arms (Walker, 1994). Mandal and Payandeh discussed a unified control 
strategy capable of achieving a stable contact against both hard and soft environment (Mandal 
& Payandeh, 1995). Tarn et al. proposed a sensor-referenced control method using positive 
acceleration feedback and switching control strategy for robot impact control (Tarn et al., 1996). 
Space robots does not have fixed bases, therefore, an impact with other free-floating objects may 
bring the space robots a catastrophe. In order to minimize the impulsive reaction force or 
attitude disturbance at the base of a space robot, strategies for colliding using reaction null- 
space have been proposed (Yoshida & Nenchev, 1995, Nenchev & Yoshida, 1998). 
Most of the researches have been made to overcome the problems introduced by impacts 
between robots and environments. Some researchers have tried to use the advantages of 
impacts. When a robot applies a force statically on an environment, the magnitude of force 
is limited by the maximum torque of the actuators. In order to exert a large force on the 
environment beyond the limitation, applying impulsive force has been studied by a few 
researchers. Uchiyama performed a nail task by a 3-DOF robotic manipulator (Uchiyama, 
1975). Takase et al. developed a two-arm robotic manipulator named Robot Carpenter, and 
performed sawing a wooden plate and nailing (Takase, 1990). Izumi and Hitaka proposed to 
use a flexible link manipulator for nailing task, because the flexible link has an advantage in 
absorbing an impact (Izumi & Kitaka, 1993). 
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However, those works mentioned above were done using robotic manipulators fixed on the 
ground except for space robots, and thus, there was no need to take care about loosing a 
balance. Humanoid robots are expected to work on human's behalf. If a humanoid robot can 
do heavy works utilizing an impulsive force as well as a human does, the humanoid robot 
will be widely used in various application fields such as constructions, civil works, and 
rescue activities. 

The first attempt on an impact motion by a humanoid robot was reported in (Hwang et al., 
2003). Matsumoto et al. performed a Karate-chop using a small humanoid robot and broke 
wooden plates (Matsumoto et al., 2004). In order for a legged robot to effectively exert a 
large force to an environment without loosing a balance, working posture is important. 
Tagawa et al. proposed a firm standing of a quadruped for mobile manipulation (Tagawa et 
al., 2003). Konno et al. discussed an appropriate working posture of a humanoid robot 
(Konno et al., 2005). 

This chapter addresses an impact motion performed by a humanoid robot HRP-2. A drum 
beating is taken as a case study, because it is a typical task that requires large impulsive 
forces. The drum beating motion is carefully designed to synchronize with music. The drum 
beating and a Japanese martial art Bojutsu were performed by a humanoid robot HRP-2 in 
the Prototype Robot Exhibition at Aichi Exposition 2005. 

2. Why and Where Is an Impulsive Force Needed? 

In order to show the advantages of using an impulsive force, a task of pushing a wall is 
taken as an example in this section. A model of a humanoid robot HRP-1 (the HONDA 
humanoid robot P3) is used in a simulation. 

Fig. 1 shows the snapshots in a simulation in which the humanoid robot HRP-1 quasi- 
statically pushes a wall, while Fig. 2 shows the snapshots in a simulation in which the 
HRP-1 dynamically pushes a wall moving a body forward. In the simulation illustrated in 
Fig. 1, the body is fixed so that the projection of the centre of gravity (COG) comes on the 
middle of the fore foot and rear foot, while in the simulation illustrated in Fig. 2, the body 
is moved so that the projection of COG moves from the centre of rear foot to the centre of 
fore foot. 

The results of the simulations are plotted in Fig. 3. Fig. 3 (a) shows the forces generated at 
the wrist (equal and opposite forces are generated on the wall) when the humanoid robot 
exerts a quasi-static force on a wall, while (b) shows the forces at the wrist when the 
humanoid robot dynamically exerts a force. 




(a) (b) (c) (d) 

Fig. 1. A humanoid robot quasi-statically pushes a wall. The body is fixed so that the 
projection of the center of gravity (COG) comes on the middle of the fore foot and rear foot, 
(a) at 0.0 [s], (b) at 2.0 [2], (c) at 4.0 [s], and (d) at 6.0 [s]. 
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Fig. 2. A humanoid robot pushes a wall moving the body to apply an impulsive force. In 
order to accumulate momentum, the body is moved so that the projection of COG moves 
from the center of rear foot to the center of fore foot, (a) at 0.0 [s], (b) at 2.0[2], (c) at 4.0 [s], 
and (d) at 6.0 [s]. 

As seen in Fig. 3, when the humanoid robot dynamically exerts a force on a wall, 
approximately 1.5 times larger force is generated compared with the case when the 
humanoid robot quasi-statically exerts a force. 

There is a strong demand for the formulation of the impact dynamics of a humanoid robot 
to solve the following problems: 

• Working postures: An optimum working posture at the impact tasks must be 
analyzed in order to minimize the angular momentum caused by an impulsive 
force. The angular momentum is more crucial than the translational momentum, 
because a humanoid robot easily falls down by a large angular momentum. 

• Impact motion synthesis: Appropriate impact motions of a humanoid robot must be 
synthesized based on multibody dynamics, to exert a large force on an 
environment. 

• Stability analysis: Exerting a large force on an environment, a humanoid robot must 
keep the balance. Therefore, stability analysis for the impact tasks is inevitable. 

• Shock absorbing control: In order to minimize the bad effect caused by the 
discontinuous velocity, shock absorbing control algorithms must be studied. 

• Enrichment of applications: Applications of the impact tasks must be developed to 
clearly show the advantages of using the impulsive force. 
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Fig. 3. Force generated at the wrist, (a) When the humanoid robot exerts a quasi-static force 
on a wall, (b) When the humanoid robot exerts an impulsive force on a wall. 
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3. A Humanoid Robot HRP-2 and Control System Software 

3.1 Specifications of the HRP-2 

A humanoid robot HRP-2 was developed in the Humanoid Robotics Project (1998-2002) 
being supported by the Ministry of Economy, Trade and Industry (METI) through New 
Energy and Industrial Technology Development Organization (NEDO). The total 
robotic system was designed and integrated by Kawada Industries, Inc. and Humanoid 
Research Group of the National Institute of Advanced Industrial Science and 
Technology (AIST). 

The height and weight of the HRP-2 are respectively 154 cm and 58 kg including 
batteries. The HRP-2 has 30 degrees of freedom (DOF). Please see the official web 
page of the HRP-2 (http://www.kawada.co.jp/global/ams/hrp_2.html ) for more 
details. 

In order to perform the drum beating and Bojutsu, small modifications are applied to the 
HRP-2. The arrangement of the wrist DOF is modified from the original, i.e. the last DOF at 
the wrist is pronated 90 °. Furthermore, gloves are developed and attached to the hands to 
grip firmly the sticks. 

3.2 Control system software 

The control system software of the HRP-2 is supplied and supported by General Robotics 
Inc. The control system software provides a controller that can be used with the CORBA 
servers of OpenHRP (Hirukawa et al., 2003). As shown in Fig. 4, the controller is composed 
of many plugin softwares. The control system software also includes the I/O access library 
to access the lower level functions of the robot and a VRML simulator model of the HRP-2 
and various utilities. 





ntroller main routine 
(hrpsys) 



Auditor 



V-HRP Controller 




Fig. 4. Control system software of the HRP-2 with OpenHRP (the figure is quoted from 
http://www.generalrobotix.com/product/openhrp/products_en.htm). 

Foundational plugins such as Kalman Filter, Sequential Playback, Walk Stabilizer, Pattern Generator, 
Dynamics, Logger, and ZMPSensor are also included in the control system software, however, 
users can develop own functions as a plugin to enrich the humanoid robot motions. Please see 
the official web page http://www.generalrobotix.com/ product/ openhrp/ products_en.htm for 
more details of the control software. 
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4. Drum Beating 

4.1 Primitive poses and motions 

In order to generate drum beating motions of the humanoid robot HRP-2, the motion is 

decomposed into four primitive poses or motions: (a) initial pose, (b) swing, (c) impact, and 

(d) withdrawing, as shown in Fig. 5. Among the four primitive motions, impact and 

withdrawing are important to exert an impulsive force. 

As presented in Fig. 6, three different swing patterns, (a) small swing, (b) middle swing and 

(c) big swing, are generated sharing the poses for the impact and withdrawing. 

For these swing patterns, three different initial poses are given and the poses to pass 

through in swing motion are designed. Cubic spline is used to interpolate the given 

poses. 




(a) (b) (c) (d) 

Fig. 5. Four primitive poses or motions in a drum beating, (a) Initial pose, (b) Swing, (c) 
Impact, (d) Withdrawing. 

4.2 Synchronization with music 

The swing motion must be synchronized with music in the drum beating. For the 
synchronization, a beat timing script is prepared for each tune. An example of the script is 
listed as follows: 

0.500 RS 

1.270 LM 

1.270 RM 

0.635 LS 

0.500 END 



The numbers listed in the first column indicate the interval (s) to the next beating. The 
symbols listed in the second column indicate the way of beating. The first character 'R' or 'L' 
indicates the arm to move (Right or Left), while the second character 'S', 'M', 'B', or 'E' 
indicates the kinds of swing (Small swing, Middle swing, Big swing, or Edge beating, see 
Fig- 6). 

For example, the third line of the script "1.270 RM" indicates "beat the drum after 1.270 s 
using the middle swing of the right arm." The period between the impact and the previous 
pose is fixed to 0.1 s to achieve the maximum speed at the impact. As shown in Fig. 6 (b), 
seven intermediate poses are designed for the middle swing between the initial pose and the 
impact, therefore, if the duration is specified to 1.270 s, each period ATm between the poses 
is calculated as follows: 
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duration -0.1 1.270-0.1 

^,= ; = ■ (1) 

number of poses 7 

The duration time varies depending upon a tune. 

There are two restrictions in the script: (i) the first beating must be RS (small swing of right 

arm), (ii) right arm and left arm must be alternating to beat. 
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Fig. 6. Three swing patterns. The periods between impact and the previous pose, and 
between withdrawing and impact are fixed to 0.1 [s]. Other periods denoted by ATs, ATm, 
ATg, are computed from the duration indicated in the beat timing script, (a) Small swing, (b) 
Middle swing, (c) Big swing. 

4.3 Control software 

Fig. 7 presents the flow of the control system. The components marked with red boundary 

boxes are developed in this work. 

Firstly, wav files of the three tunes are prepared: (i) ware wa umi no ko (I am a son of the sea), 

(ii) Tokyo ondo (Tokyo dance song), and (iii) mum matsuri (village festival). They are very old 

and traditional tunes, and thus, copyright free. As soon as the Speak Server receives a queue 

from the robot control system, the server starts playing the tune. The queue is used to 

synchronize the tune with the drum beating motion. 

Secondly, the timings of beating are scheduled by hand. In order to strictly count the 

timing, a time keeping software is newly developed. The time keeping software counts the 

rhythm of a tune. The timings of the beating are described in a script file as mentioned in 

Section 2. 

Thirdly, a plugin software is developed as a shared object to generate drum beating motions 

interpreting the beat timing script. 

Fourthly, interpolating the given poses presented in Fig. 6 using cubic spline, trajectories of 

all joints are produced online. The produced trajectories are given to the humanoid robot 

through a plugin SeqPlay. 
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developed in this work. 
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4.3 Resultant joint trajectories 

The reference and resultant joint trajectories of the elbow and wrist joints of the right arm 
are plotted in Fig. 8. The error in the impact time was approximately 30 [ms], which was not 
significant in the synchronization with music. 
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Fig. 8. A software diagram. The components marked with red boundary boxes are 
developed in this work. 
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As can be seen in Fig. 7, during the last 0.1 [s] before the impact (approximately from 0.5 to 
0.6 [s]), gradients of the joint trajectories are steep compared with other periods. Since the 
period between the impact and the previous pose is set to 0.1 [s], maximum joint speed is 
almost achieved. 

5. A Japanese Martial Art Bojutsu 

In martial arts, impulsive forces are frequently used to fight with an antagonist. A Japanese 
martial art Bojutsu was also demonstrated by the humanoid robot HRP-2 in Aichi 
Exposition, although an impact was not performed in the demonstration. Some dynamic 
motions used in the demonstration are presented in Fig. 9. 
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Fig. 9. The Japanese martial art Bojutsu motion patterns, (a) Thrusting a staff weapon 
rightward. (b) Thrusting a staff weapon leftward, (c) Banging down a staff weapon. 

6. Demonstration at Aichi Exposition 



The Prototype Robot Exhibition was held for 11 days from June 9 to 19, at the Morizo and 
Kiccoro Exhibition Center, a convention venue in the Aichi Expo site. The Prototype Robot 
Exhibition was organized by the Japan Association for the 2005 World Exposition and the 
New Energy and Industrial Technology Development Organization (NEDO). 63 prototypes 
performed demonstrations during the period. 
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The drum beating and Bojutsu demonstration was performed twice a day in the Prototype 
Robot Exhibition (Fig. 10). 





Fig. 10. Demonstrations at Aichi Exposition 
Japanese martial art Bojutsu performance. 



2005. (a) Drum beating performance, (b) A 



7. Conclusion 

This chapter proposed to utilize an impulsive force for humanoid robots to exerts a large 
force beyond the torque limitations of actuators. The problems of the impact tasks to be 
solved in the future work were brought up in Section 2. 

A drum beating is taken as a case study, because it is a typical task that requires large 
impulsive forces. The details of the drum beating and a Japanese martial art Bojutsu performed 
by a humanoid robot HRP-2 in the Aichi Exposition were presented in this paper. 
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1. Introduction 

This chapter presents recent research results of our laboratory in the area of vision and 
locomotion coordination with an emphasis on foveated multi-camera vision. A novel active 
vision planning concept is presented which coordinates the individual devices of a foveated 
multi-camera system. Gaze direction control is combined with trajectory planning based on 
information theoretic criteria to provide vision-based autonomous exploring robots with 
accurate models of their environment. 

With the help of velocity and yaw angle sensors, mobile robots can update the internal 
knowledge about their current position and orientation from a previous time step; this 
process is commonly referred to as dead-reckoning. Due to measurement errors and 
slippage these estimations are erroneous and position accuracy degrades over time causing 
a drift of the estimated robot pose. To overcome the drift problem it is common to take 
absolute measurements evaluating visual information, which are fused dynamically with 
the odometry data by applying Kalman-filter or other techniques, e.g. (Dissanayake et al., 
2001). The use of active vision systems for navigation is state-of-the-art providing a 
situation-related selective allocation of vision sensor resources, e.g. (Davison & Murray, 
2002; Seara et al., 2003; Vidal-Calleja et al., 2006). Active vision systems comprising only one 
type of vision sensor face a trade-off between field of view and measurement accuracy due 
to limitations of sensor size and resolution, and of computational resources. In order to 
overcome this drawback the combined use of several vision devices with different fields of 
view and measurement accuracies is known which is called foveated, multi-resolution, or 
multi-focal vision, e.g. cf. (Dickmanns, 2003; Kuhnlenz et al., 2006; Ude et al., 2006). Thereby, 
the individual vision devices can be independently controlled according to the current 
situation and task requirements. The use of foveated active vision for humanoid robot 
navigation is considered novel. 

Active vision is also frequently utilized in the context of robotic exploration. Yet, gaze control 
and locomotion planning are generally decoupled in state-of-the-art approaches to simultaneous 
localization and mapping (SLAM). An integrated locomotion planning and gaze direction 
control concept maximizing the collected amount of information is presented in the second part 
of this chapter. This strategy results in more accurate autonomously acquired environment 
representations and robot position estimates compared to state-of-the-art approaches. 
The chapter is organized as follows: In Section 2 vision-based localization and mapping in 
the context of humanoid robots is surveyed; Section 3 is concerned with foveated multi- 
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camera coordination; novel concepts of gaze control and path planning coordination are 
presented in Section 4; evaluation studies comparing the novel concepts to conventional 
planning approaches and vision systems are presented in Section 5; conclusions are given in 
Section 6. 



2. Vision-Based Localization and Mapping for Humanoid Robots 

Most state-of-the-art humanoid robots are equipped with vision systems. The benefits of 
using these vision systems for providing absolute measurements of the robot pose in the 
environment are obvious: pose information on landmarks is provided and no additional 
devices as, e.g., laser scanners are necessary. Being equipped with internal sensors - angular 
sensors in the joints and widely used gyros and accelerometers in the trunk - humanoid 
robots are basically capable of dead-reckoning, i.e. the ability to update position and 
orientation known from previous measurements. Thus, common simultaneous localization 
and mapping techniques are applicable which are covered by common literature, e.g. (Sabe 
et al., 2004; Ozawa et al., 2005; Thomson & Kagami, 2005; Stasse et el., 2006). 



tJ° 



OH 







oh 



Fig. 1. Humanoid robot navigation scenario. 

A fundamental aspect in simultaneous localization and mapping for humanoid walking is 
the formulation of a state-space model accounting for the footstep sequences of the robot. In 
vision-based SLAM, the system state, i.e. the robot pose and environment point positions, 
are predicted based on the dead-reckoning model of the mobile robot. Common Kalman- 
filter techniques are applied in order to obtain more accurate estimations accounting for 
uncertainties in the robot locomotion. Whenever visual measurements of environmental 
points are taken, updates of the robot state are computed. Changing ground contact 
situations of the feet, however, result in different kinematic chains from a world reference 
frame to measured environment points. This discontinuous movement of the humanoid 
robot requires an adaptation of the filter formulation. In earlier works we proposed a hybrid 
formulation of the state-space model in order to capture this locomotion principle (Seara et 
al., 2003). Thereby, the robot reference frame is placed in the foot currently in contact with 
the ground and is switched whenever the supporting foot changes. The dead-reckoning 
model is expressed by 
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x k+i = x k( l -rk) + fs( x k< u k> d x,k)n (1) 

where state-vector x contains the robot foot pose and the landmark positions, d represents 
system noise capturing dead-reckoning uncertainties, and J€{0;1} is a binary variable 
indicating a change of the supporting foot when y=l. The commanded step u is expressed 
by 

u k=iF x s,k FVs.k F&s,k\ ^\ 

including the commanded step position [x s y s ] T and orientation S with respect to the current 
supporting foot frame Sf. Figure 1 schematically shows a typical SLAM situation of a 
humanoid robot with the reference frame currently placed in the left foot. 
In vision-based SLAM field of view restrictions of the vision device strongly limit the 
number of landmarks to be observed simultaneously. Yet, a larger field of view can only be 
realized accepting a lower measurement accuracy of the vision device mainly due to 
limitations of sensor size and resolution. Therefore, we propose the use of several vision 
devices which provide different fields of view and accuracies and a novel gaze control 
concept for coordinating the individual vision devices in order to provide both, large field of 
view and high measurement accuracy, simultaneously. These foveated active vision 
concepts for robot navigation are discussed in the following section. 

3. Foveated Multi-Camera Coordination 

3.1 Active Vision in SLAM 

In order to gather an optimal situation-dependent amount of information the control of the 
vision system pose is common. To date, there are only few works in the area of active 
vision-based SLAM, e.g. (Davison & Murray, 2002; Se et el., 2002; Vidal-Calleja et el., 2006) 
which are based on measures representing the information gathered with respect to the 
SLAM task. All these approaches are greedy strategies only evaluating the current situation 
without considering future planning steps. In order to obtain an optimal gaze direction 
considering also some future planning steps, we proposed a gaze direction planning 
strategy with limited time horizon (Lidoris et al., 2006). Furthermore, in earlier works (Seara 
et al., 2003) we introduced a gaze control strategy considering concurrent tasks, localization, 
and obstacle avoidance for humanoid robots in order to account for navigation in physical 
environments. 

3.2 Foveated Active Vision 

Vision systems comprising only one type of vision sensors face a tradeoff between 
measurement accuracy and field of view due to limitations of sensor size and computational 
resources for image processing. Accuracy and field of view are mainly determined by the 
focal-length of the lens or mirror optics, respectively. Within the context of robot navigation 
this tradeoff implies a compromise between localization accuracy and keeping a large part 
of the scene in view. 

With an active vision system this tradeoff could be compensated providing that a 
sufficiently accurate map of relevant landmarks or structures of interest to be observed is 
known a priori. Then the highest available focal-length and, thus, the highest measurement 
accuracy could be chosen. If additionally very fast gaze shifts can be realized, the narrow 
field of view would be acceptable as visual attention can be directed dynamically towards 
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the most relevant structure in the current situation. Yet, in a variety of scenarios this 
approach is unsuitable or even unrealizable. In at least partially unknown environments 
and in exploration scenarios a sufficient map is not available and thus has to be created 
online. However, due to the strongly limited field of view the detection of new objects of 
potential interest is hardly possible. Another aspect are potentially relevant or even 
dangerous objects or activities in the local surroundings of the robot which cannot be 
detected. 

In order to overcome the common drawback of trading field of view versus measurement 
accuracy, the combination of wide-angle and telephoto vision devices has been suggested. 
Such systems provide at the same time both, an observation of a large part of the 
environment and a selective examination with high accuracy. In common literature these 
systems are referred to as foveated, multi-resolution or multi-focal systems. The individual 
vision devices may be fixed with respect to each other or may be independently motion 
controllable in one or more degrees of freedom. Most common embodiments of foveated 
systems are used in state-of-the-art humanoid robots comprising two different cameras 
combined in each eye which are aligned in parallel, e.g. (Brooks et al., 1999; Ude et al., 2006; 
Vijayakumar et al., 2004). Systems for ground vehicles, e.g. (Apostoloff & Zelinsky, 2002; 
Maurer et al., 1996; Dickmanns, 2003) are another prominent class. An upcoming area are 
surveillance systems which strongly benefit from the combination of large scene overview 
and selective observation with high accuracy, e.g. (Bodor et al., 2004; Davis & Chen, 2003; 
Elder et al., 2004; Jankovic & Naish, 2005; Horaud et al., 2006). An embodiment with 
independent motion control of three vision devices with a total of 6 degrees-of-freedom 
(DoF) is the camera head of the humanoid robot LOLA developed at our laboratory which is 
shown in Figure 2 providing more flexibility and, due to directly driven gimbals, faster 
camera motions than other known systems, cf. e.g. (Kuhnlenz et al., 2006). 




tefephoto 
:tef£D 



Fig. 2. Multi-focal vision system of humanoid LOLA (Kuhnlenz et el. 2006). 

Most known methods for active vision control in the field of foveated vision are concerned 
with decision-based mechanisms to coordinate the view direction of a telephoto vision 
device based on evaluations of visual data of a wide-angle device. For a survey on state-of- 
the-art methods cf. (Kuhnlenz, 2006). A first approach towards a coordination of foveated 
multi-camera view direction planning for humanoid walking has been investigated in our 
laboratory which is presented in the following sections. 
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Fig. 3. Humanoid robot navigation scenario with multi-camera vision. 



3.3 Considerations for Camera Coordination 

In the area of foveated vision a large body of literature exists covering mechanisms to assess 
peripheral visual data in order to derive control commands to direct foveal attention 
towards regions of potential interest. The most prominent computational approaches in the 
biologically inspired field are computational neuroscience models of top-down modulated 
bottom-up attention weighting particular visual features of the environment, e.g. (Koch & 
Ullmann, 1984; Itti & Koch, 2001). In the technical field a larger variety of different methods 
is known. Common approaches solve optimization problems, assess the visual information 
content, or evaluate the environment towards particular visual features, e.g. (Bodor et al., 
2004; Darrell, 1997; Pellkofer & Dickmanns, 2000; Scasselati, 1998; Shibata et al, 2001). To 
date, only few works have been presented on foveated and multi-camera attention 
considering locomotion tasks. Prominent examples are the works of (Pellkofer & 
Dickmanns, 2000) in the field of visual guidance of autonomous ground vehicles and gaze 
control concepts for the humanoid LOLA conducted in our laboratory (Kiihnlenz, 2006), 
where optimal view directions are determined by maximizing the information gain. 
In earlier works we proposed a task-related information measure as quality measure termed 
incertitude (Seara et al., 2003) which has been taken as the basis for the coordination of the 
two stereo-camera devices of LOLA with different characteristics. The mission of the 
humanoid robot is a locomotion task to walk along a certain path or to explore the world. A 
primary condition for view direction planning, thus, has to consider the quality of 
locomotion task accomplishment in order to determine an optimal view direction for the 
next time step. The concept of incertitude captures this task-dependence by evaluating the 
predicted certainty of the estimated robot foot pose. Therefore, the average of the main axes 
lengths of the foot pose covariance matrix confidence ellipsoid is computed 



v {) . 



i2> 



(3) 



where counter i covers the considered components of the foot pose and e, are the 
eigenvalues of the predicted foot pose covariance matrix P uu which is a submatrix of the 
predicted covariance matrix of a possible target state as estimated by the Kalman-filter, e.g. 
cf . (Dissanayake et el., 2001) 
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where P l U u is the error covariance matrix of the robot state estimate, P i mm is the map 
covariance matrix of the landmark state estimates and P' um is a cross-covariance matrix 
between robot and landmark states. Low values of the defined measure (3), thus, indicate a 
high certainty of the robot pose estimation and, therefore, good task performance for the 
locomotion task. Additional measures to assess the performance of secondary tasks have 
been proposed which also may have an indirect impact on the performance of the primary 
(locomotion) task, e.g. field of view limitations, presence of activities, etc., (Kuhnlenz, 2006). 
These measures are all extensions to the central gaze control concept and, therefore, out of 
scope of this chapter. 

Given such measures to assess the task performance of the humanoid robot the next task is 
to derive appropriate view directions for the individual vision devices in the following time 
step in order to achieve a particular desired task performance. This gaze control concept is 
topic of the following section. 

3.4 Multi-Camera View Direction Planning 

Common approaches to optimal view direction planning for mobile systems are based on a 
maximization of the information gain, e.g. (Davison, 1998; Pellkofer & Dickmanns, 2000; 
Seara et al., 2003), in order to determine either a selected gaze shift or a sequence of gaze 
behaviors. Particularly, in the field of foveated and multi-camera vision also visibility 
conditions are considered, e.g. (Pellkofer & Dickmanns, 2000; Kuhnlenz, 2006). 
The basic principle of multi-camera coordination in this chapter is an information 
maximization over a set of possible view directions of independent vision devices. The 
assumed task of the robot is to follow a path as closely as possible. As a consequence the 
estimation error of the robot pose within the environment during its motion has to be 
minimal in order to complete the mission optimally. The presumed objective for view 
direction planning is to gather the largest possible amount of information with respect to the 
task to be accomplished. An information gain corresponds to a reduction of uncertainty. In 
order to maximize the information gain the robot pose error has to be minimized by 
selecting appropriate view directions of the individual cameras of the foveated multi- 
camera vision system. Following this, an optimal configuration of view directions for the 
locomotion task in the next time step satisfies the condition of minimizing the robot pose 
estimation error. In terms of the task-related information measure defined in the previous 
section this gaze control strategy can be expressed by 

£2 = arg min v 

" , (5) 

where Q=\pani tiltj ... pan n tilt n ] T is a configuration of pan- and tilt-angles of all vision 
devices, Vo is the incertitude information measure defined in the previous section, and (.)* 
denotes the optimal value. This method constitutes an extension to our earlier works on 
gaze control for humanoid robots (Seara et al., 2003) generalizing them to multi-camera 
vision systems. In Section 6, a comparative evaluation of this strategy is presented assuming 
a humanoid robot navigation scenario with sparsely distributed point landmarks. 
The presented gaze control strategy considers a preplanned path of the humanoid robot 
which is not altered as the robot moves. The following section is concerned with combined 
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planning of gaze direction and locomotion path in order to provide the mobile robot with 
capabilities of exploring unknown environments. 

4. Combined Gaze Direction and Path Planning 

In the previous section a foveated approach to active vision has been presented which 
optimally controls the devices such that the robot pose error is minimized. This section is 
concerned with a novel approach which combines locomotion planning and gaze direction 
control concepts in order to improve autonomous robotic exploration. 

4.1 Locomotion Planning for Exploration and SLAM 

Robotic exploration is largely understood as investigating an unknown environment such 
that the area is covered by the robot sensors and a representation is generated allowing the 
robot to perform its tasks with a certain amount of confidence. Early approaches focused on 
generating motion commands which minimize the time needed to cover the whole terrain. 
This was achieved by extracting frontiers between known and unknown areas (Yamauchi, 
1998; Koenig & Tovey, 2001) and visiting the nearest unknown area. Such approaches only 
distinguish between previously visited and unknown terrain without taking into account 
the amount of information gathered after each action. To incorporate the uncertainty about 
the state of the environment, (Moorehead et al., 2001) try to minimize the uncertainty of the 
robot about grid cells, by using entropy as a criterion. Further, (Grabowski et al., 2003) 
present an approach in which the robot is forced to observe obstacles from different 
viewpoints so that sharper boundaries between objects and free-space are acquired. 
However, the techniques mentioned above assume the location of the robot as known. 
Recently, some techniques have been proposed which actively control the motion of the 
robot while simultaneously creating a map of the environment and localizing the robot in it. 
In (Feder et al., 1999) information gain is introduced as a measure of utility for locally 
deciding on exploration actions. Formal information measures, as discussed in the 
introduction of this section, can be used to quantify uncertainty and therefore evaluate the 
effect of future control actions on the quality of the robot state estimate. Therefore, 
(Bourgault et al., 2002) introduced a utility function which trades off the cost of exploring 
new terrain with the utility of selecting future positions that reduce uncertainty. In 
(Stachniss et al., 2005) a similar decision theoretic framework is used in combination with a 
particle filter based SLAM solution for robotic exploration. Further, (Bryson & Sukkarieh, 
2005) present simulated results which demonstrate the effect of different actions to 
information gain for unmanned aerial vehicles performing vision-based SLAM. 
All the approaches mentioned above, perform a greedy optimization based on information 
theoretic criteria for the trajectory generation only over the next time step. However, some 
planning approaches have been introduced which demonstrate improved performance. 
Such a planning approach that introduces a new measure of map quality is described in 
(Sim & Little, 2006). However, some initial state estimate of all the landmarks is assumed 
and sensors are assumed to have an unlimited field of view. Another multi-step planning 
algorithm for SLAM is described in (Huang et al., 2005). 

4.2 Combined Gaze Direction and Path Planning 

Most conventional approaches in autonomous exploration and active vision either control 
the motion of the robot or the active sensors. In this chapter, we adapt the control inputs of 
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the robot and the sensory system simultaneously so that the best state estimates possible are 
acquired and as much new terrain as possible is explored. 
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Fig. 4. Proposed motion and gaze direction control scheme 

Figure 4 illustrates the proposed motion and gaze direction control scheme. The robot and 
its active vision system are controlled by two modules which use a common model of the 
environment. For trajectory planning, a multi-step prediction algorithm is introduced in 
order to evaluate all possible positions that can be reached by the robot over a finite given 
time horizon. This estimation forms a multi-attribute function which is used to decide 
where the robot should move next. A trade-off is made between localization, map accuracy, 
and proactivity of exploration. For the gaze direction control a greedy information-based 
optimization is used to choose those view directions that minimize position and map 
uncertainties. The robot depends on noisy data gained from the visual sensors and at the 
same time its actions affect the quality of the collected data and its environment. 
For vision-guided robots one definition for optimally using sensory resources is selecting 
the next gaze direction such that measurements are obtained which are most informative 
about the state of the environment and the robot. This raises the question how information 
gain can be measured. A common measure of uncertainty is entropy. Which has been 
introduced by (Shannon, 1948). Entropy for a multivariate Gaussian distribution p(x), with 
covariance P is defined as 

H(p(x)) = i-log((2*)"|P|) 

2 (6) 

Since the determinant of a matrix is a measure for its volume, the entropy measures the 
compactness and, thus, the informativeness of a distribution. In order to measure the utility 
of a gaze direction which will result in an observation z, we will use the mutual information 
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gain I[x,z]. The gain of information between any two distributions can be computed as the 
change in entropy. In our case this change is the difference between the entropies of the 
state estimates before and after making an observation which are both multivariate 
Gaussians with covariances Pk+i\k and Pjt+i|fc+l. Therefore, the information gain evaluates to 

I[x,z] = H(x)-H(x\z) = hog(\P k+nk I) - ilog(k +1 |* +I I) 

2 I I 2 I I (7 ) 

This information gain can be calculated as a function of the state covariance matrix. From (7) 
it is obvious that information gain I[x,z] becomes maximal, if the determinant of P/ ( +i|;t+i is 
minimized. Starting from the current state estimate the covariances of the states that can be 
observed next by the vision sensors are predicted. The equations for the prediction step of 
the classical SLAM algorithm based on an extended Kalman-filter (Dissanayake et al., 2001) 
are used. After all covariances are predicted the most informative state can be calculated by 
minimizing |Pit+i|it+i | . The new optimal gaze direction £2 of the active vision system 
corresponding to this state is then computed. 
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Fig. 5. Region covered while planning over a horizon of N steps. Highlighted grid cells 
show which cells are taken into account for gaze direction control. 

The first step for choosing the next destination for the robot is to estimate the states and 
covariances of all possible positions that can be reached over its planning horizon. A 
discretized grid environment is used, where each grid represents a position that can be 
reached by the robot over future time steps. Therefore, the size of the grid cells depends on 
the physical properties of the robot. Based on this discretized environment the states and 
their covariances are computed. While the robot moves, observations are made and used to 
update the state estimation. This way, all available information is being used. More 
specifically, based on an initial state estimate and covariance matrix, we calculate all 
possible robot states and their covariances after N time steps and choose to move to the one 
that is most informative, namely the one that minimizes relative entropy as described in the 
previous section. A mathematical description of the algorithm used to produce the multi- 
step predictions, can be found in (Lidoris et al., 2007). The estimation procedure evolves in a 
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square-like manner, as shown in Figure 5. Starting from the currently estimated state the 
first eight neighboring states and co variances are computed. At each step, the estimated 
state and covariances of the neighboring states are used to infer the next ones until step N. 
By always using the nearest neighbor in the estimation process, the estimation error is kept 
minimum. Over each time step k, 8k new states are calculated. The control signal, uh,k 
required in order to drive the robot from a state j to a state i, is chosen as indicated by the 
arrows in Figure 5. 

Using the predicted covariance matrix (4) and the concept of relative entropy mentioned 
previously, each possible future position of the robot can be evaluated to choose the 
appropriate target position for the robot. The destination that maximizes the function 

* UU L mm /Q\ 

is chosen as a target for the robot. The first part of the function is a measure of the position 
uncertainty the robot will encounter in the future position and the second part is a measure 
of the map quality. The constant y can be used to adjust the behavior of the robotic explorer. 
Setting y to values smaller than one, results in a conservative exploration policy, since the 
robot will stays near to well-localized features giving more attention to localization. Large 
values of y increase the proactivity of the explorer in the sense that it moves to unknown 
areas neglecting the lower localization accuracy. After selecting the target position which 
maximizes (8), the robot moves making observations which are used to update the 
estimated state and covariance. Each time a new state estimate is available, a recalculation of 
the gaze direction is made. This way we use all new information that becomes available 
during robot motion. Replanning takes place after N time steps when the target position is 
reached. 

5. Comparative Evaluation Studies 

In Sections 3 and 4 novel concepts of foveated active vision and combined gaze and 
locomotion coordination have been presented. This section is concerned with evaluation 
studies in order to assess the performance of the proposed approaches in comparison to 
state-of-the-art planning methods and vision systems. 

5.1 Foveated Active Vision 

In Section 3 a task-related information measure for the humanoid robot locomotion task and 
a multi-camera view direction planning strategy have been introduced. This section is 
concerned with an evaluation study comparing the performance of the novel approach to a 
conventional single stereo-camera strategy. 

Considered is a typical locomotion task of a humanoid robot with the robot moving along a 
planned path. It has visual and odometrical capabilities such that it is able to localize itself and 
other objects within the environment. The robot is equipped with a foveated multi-camera 
vision system consisting of two stereo-camera devices with independently controllable pan- 
and tilt-angles, different focal-lengths, and different fields of view. The robot's mission is to 
follow the desired path. Therefore, it has to localize itself continually evaluating odometry 
data and visual information. Given a particular environmental situation, i.e. configuration of 
observable objects and robot pose, the objective is to dynamically select appropriate view 
directions for both vision devices. Figure 3 exemplarily shows a situation in the considered 
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navigation scenario where a humanoid robot fixates two landmarks with two vision devices of 
its foveated multi-camera vision system in order to localize itself in the world. 
In order to demonstrate the benefits of foveated multi-camera view direction planning the 
proposed gaze control approach is now evaluated in a structured humanoid robot navigation 
scenario. Several vision system configurations are evaluated by comparison of the achieved 
navigation performances. The basic scenario is shown in Figure 6. Four landmarks are 
distributed within a rectangular environment. The mission of the robot is to follow the 
planned path in x-direction. In order to complete the mission successfully the robot has to 
localize itself within the environment evaluating available visual information on the positions 
of the identified landmarks. The robot pose is estimated dynamically using the Kalman-filter 
approach described in Section 2. In order to maximize the information gain optimal view 
directions of the individual vision devices are selected dynamically based on the proposed 
approach in Section 3.4. The positions of the landmarks are not known a priori nor are the 
number of landmarks. Configurations of the vision system in the considered scenario to be 
compared are: a) conventional single stereo-camera, focal-lengths 20mm, aperture angles 30°, 
stereo-base 25cm; b) foveated stereo-camera with two cameras per eye aligned in parallel, 
focal-lengths 2mm and 40mm, respectively, aperture angles 60° and 10°, respectively, stereo- 
bases 25cm; c) two independent stereo-cameras, focal-lengths 2mm and 40mm, respectively, 
aperture angles 60° and 10°, respectively, stereo-bases 25cm. All cameras are ideal, based on 
the pinhole camera model neglecting lens distortion and quantization effects. Gaussian vision 
sensor noise with a standard deviation of 1 pixel is considered. Dead-reckoning errors are 
taken from experiments with the humanoid robot JOHNNIE. 
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The navigation performance is rated assessing the localization accuracy. Therefore, the 
covariance matrix of the robot position is evaluated computing the areas of the 90%- 
confidence ellipses of the footstep position uncertainties of the humanoid robot. Figure 7 
contains a cut-out of Figure 6 showing the planned and real paths, the path estimated by the 
Kalman-filter, the foot step positions, and their covariance ellipses. It is noted that due to 
dead-reckoning errors the real path deviates increasingly from the planned path as 
locomotion control is open loop. The estimated path follows the real path well. 
Figure 8 and Figure 9 show the resulting view directions for each step of the robot for the 
individual vision systems. The propagations of the areas of the confidence ellipses are 
shown in Figure 10. Table 1 shows a comparison of the means of the confidence ellipse areas 
and the average number of landmarks visible for the vision systems for all scenarios. These 
results are discussed in the following. 




3 4 

x[m] 

Fig. 8. Pan-angles of the a) conventional stereo-camera and b) of the foveated stereo-camera 
with two cameras per eye. 

Mono-focal vision systems, i.e. systems comprising only one sensor type, suffer from a 
trade-off of accuracy versus field of view. In robot localization not only measurement 
accuracy, but also the number of visible landmarks is an important factor in order to 
determine the current robot position. Depending on the distribution of landmarks and the 
current situation it may be better to observe more landmarks with lower accuracy in one 
situation and fewer landmarks with higher accuracy in another. Thus, mono-focal 
systems are always a compromise working well only for a very limited class of 
environmental conditions and situations, however, failing in others. This problem is 
reflected by the results shown in Figure 8a and Table 1. The upper two of the sparsely 
distributed landmarks are not detected at the initial position and, thus, the planner only 
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considers the lower landmarks. Most of the time only one landmark is visible and used 
for localization. 

The foveated vision systems provide much more flexibility. Due to the large field of view of 
the wide-angle device more landmarks are detected in the initial position to be considered 
by the planner. So, at each step two or more landmarks are used for localization whereas at 
least one landmark is focused with telephoto cameras resulting in a significantly higher 
certainty of the estimated footstep positions of the humanoid robot as shown in Figure 10 
and Table 1. 
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Fig. 9. Pan-angles of foveated vision system with two independent stereo cameras; a) 
telephoto and b) wide-angle stereo-camera. 
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Fig. 10. Comparison of the areas Ago of the 90% -confidence ellipses of the footstep position 
covariance matrix using a conventional and a foveated vision system. 
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vision system 


Ag [l(Hm2] 


average number of visible 
landmarks 


a) conventional 


2.7 


1.1 


b) foveated 


1.7 


2.3 


c) foveated 


1.6 


2.4 



Table 1. Mean of the areas Ago of the 90% -confidence ellipses of the footstep covariance 
matrix and average number of visible landmarks. 



5.2 Combined Gaze and Trajectory Planning 

In this section the performance of the proposed combined gaze direction and motion 
planning scheme described Section 4 is evaluated. The simulated environment consists of an 
area of size 20x20 meters with randomly allocated features. The active simulated head 
which is mounted on top of the robot is assumed to have a field of view of 60° and a 
maximum viewing range of 6 meters. No previous knowledge of the environment is 
assumed. Since it is out of the scope of this paper to deal with the correspondence problem 
of SLAM or feature selection, feature association is considered known and all observed 
features are used. A harsh odometry error of 10% is chosen, since are interested to see how 
our algorithm performs with very inaccurate robot models. A sensor model with a variance 
proportional to the distance for bearing and range measurements is used having the same 
high noise level. The active head can be moved with high angular velocities, so that saccadic 
movements are simulated. Finally, y in (8) is chosen heuristically such that the robot 
balances between keeping good pose estimates and exploring the environment. The 
simulations time is set to 60s. 

We first ran a simulation assuming a passive sensor, directed always straight ahead of the 
robot and a greedy policy for motion control which considers only the eight neighboring 
states. Only seven features were observed and map and localization uncertainty were very 
high as shown in Figure 11. 

Next we conducted simulation with the proposed gaze direction control and again a greedy 
policy, followed by a simulation with a three-step planning horizon. In Figure 11 the 
absolute position error is depicted, for all three cases. It is evident that error reduces 
significantly as the planning horizon grows and gaze direction control is used. 
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Fig. 11. Absolute position error as a function of time for one-step planning horizon without 
gaze direction control and one-step and three-step planning horizons with gaze direction 
control. 
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The map accuracy is illustrated in Figure 12 by the error ellipsoids for each observed feature 
for the final map. It is obvious that map accuracy grows as the planning horizon becomes 
larger. Also more features are observed if gaze direction control is used. From the final map 
acquired in the case of a three-step planning horizon with gaze direction control, it becomes 
clear that the proposed approach balances well by observing a large number of features and 
also building an accurate map. 
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Fig. 12. Map accuracy illustrated by the error ellipsoids of each observed feature for the final 
map; a) one-step planning horizon without gaze direction control, b) one-step, and c) three- 
step planning horizons with gaze control. The estimated robot trajectory is illustrated by the 
black lines, while the red triangle on-top of the robot represents the active head and its gaze 
direction. 

Figure 13 shows the reduction of the entropy over time. Each time a new feature is observed 
entropy reduces. For that reason it is step-formed. The greedy approaches need more time 
to reduce entropy and the larger the planning horizon is, the more entropy is reduced. 
Furthermore, when the planning horizon is small, more time is needed to observe the same 
number of features. Without gaze direction control entropy is not satisfactorily reduced. 
This results from the fact, that the gaze direction control module chooses to direct the sensor 
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system mostly towards already observed and more certain features when the environment 

is known. Therefore, localization error and feature position uncertainties are kept at a 

minimum. 

The results show that the proposed approach combining gaze direction control and motion 

planning based on information theoretic concepts for the exploration task, gives superior 

results in comparison to greedy approaches and approaches that neglect active sensor 

control. 
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Fig. 13. Entropy measure log( | Pk+i\k+i \ ) over time for a one-step planning horizon without 
gaze direction control, one-step and three-step planning horizons with gaze direction 
control. 



6. Conclusions 

This chapter presents a foveated active vision planning concept for robot navigation in 
order to overcome drawbacks of conventional active vision when trading field of view 
versus measurement accuracy. This is the first approach of task-related control of foveated 
active vision in the context of humanoid robots as well as localization and mapping. In a 
typical robot navigation scenario the benefits of foveated active vision have been 
demonstrated: an improved localization accuracy combined with an extended visible field 
compared to conventional active vision. As a generic information maximization principle 
has been used the gaze control strategy is generalizable to other scenarios depending on the 
definition of the task-related information measures, thereby, allocating vision sensor 
resources optimally. 

The second part of this chapter has presented novel concepts of coordinating gaze direction 
and locomotion planning. Through this planning procedure, all possible trajectories and 
viewing angles over a specific time horizon are anticipated. This results in significantly 
improved performance compared to known approaches, which neglect the limitations of the 
sensors. It is demonstrated that a very accurate environmental model is autonomously 
produced even if very inaccurate robot and sensor models are used. Such a scheme is ideal 
for vision-based humanoid robots. 

The proposed approaches assume time-invariant scenarios. Path planning and gaze control 
methods for dynamically changing environments are not yet covered and subject to ongoing 
research. 
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1. Introduction 

Vertical jumping is a complex task requiring quick and harmonized coordination of 
jumper's body segments, first for the push-off, then for the flight and lastly for the landing. 
The prime criterion for vertical jump efficiency is the height of the jump that depends on the 
speed of the jumper's center of gravity (COG) in the moment when the feet detach from the 
ground. Besides maintaining the balance, the task of the muscles during the push-off phase 
of the jump is to accelerate the body's COG up in the vertical direction to the extended body 
position. During the push-off phase of the jump, the jumper's center of gravity must be 
above the supporting polygon that is formed by the feet (Babic et al., 2001). In contrast to the 
humans, today's humanoid robots are mostly unable to perform fast movements such as the 
vertical jump. They can mostly perform only slow and statically stable movements that do 
not imitate the human motion. Besides, these slow and statically stable movements are 
energy inefficient. With the understanding of the anatomy and the biomechanics of the 
human body, one can find out that, beside the shape, majority of today's humanoid robots 
and human bodies do not have a lot of common properties. To achieve a better imitation of 
the human motion and ability to perform fast movements such as the vertical jump or 
running, other properties and particularities, beside the shape of the body, should be 
considered in the design of the humanoid robot. 

Lower extremities of today's humanoid robots are mostly serial mechanisms with simple 
rotational joints that are driven directly or indirectly by electrical servo drives. Such design 
of humanoid robot mechanism allows only rotational motion in joints to occur. This means 
that translations of the robot's center of gravity are solely a result of the transformation of 
rotations in joints into translations of the robot center of gravity. Especially in ballistic 
movements such as fast running or jumping where the robot center of gravity is to be 
accelerated from low or zero velocity to a velocity as high as possible, this transformation is 
handicapped. The transfer of the angular motion of the lower extremity segments to the 
desired translational motion of the robot center of gravity is less effective the more the joints 
are extended. When the joint is fully extended, the effect of this joint on the translational 
motion of the robot center of gravity in a certain direction equals zero. Besides, the motion 
of the segments should decelerate to zero prior to the full extension to prevent a possible 
damaging hyperextension. Where relatively large segments which may contain considerable 
amounts of rotational energy are involved, high power is necessary to decelerate the angular 
motion. 
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Servo drives that drive the joints of the humanoid robot represent a model of the group of 
monoarticular muscles that are passing a single joint and thus cause a rotational motion of 
the joint that are passing. Beside the monoarticular muscles that are passing only one joint, 
human lower extremity consists of another group of muscles that are passing two joints. 
These muscles are so called biarticular muscles. Their functions during human movement 
have been studied extensively by many researchers. One of such functions is the 
transportation of mechanical energy from proximal to distal joints. It is believed that this 
transportation causes an effective transformation of rotational motion of body segments into 
translation of the body centre of gravity (Schenau, 1989). Gastrocnemius muscle is a 
biarticular muscle that is passing the knee and the ankle joints and acts as a knee flexor and 
ankle extensor (see Fig. 1). 



gastrocnemius 
elastic tendon 



Fig. 1. Biarticular muscle gastrocnemius passing the knee and the ankle joints. It acts as a 
knee flexor and ankle extensor. Gastrocnemius muscle is connected to the foot by an elastic 
tendon. 

In jumping, the activation of the biarticular gastrocnemius muscle prior to the end of the 
push-off enables the transportation of the power generated by the knee extensors from the 
knee to the ankle joint. This transfer of mechanical energy by gastrocnemius can be 
explained using the following example. During the push-off phase of the jump, the knee 
joint is rapidly extended as a result of the positive work done by the knee extensor muscles. 
If the biarticular gastrocnemius muscle contracts isometrically (its length does not change), 
the additional mechanical work is done at the ankle joint because of the gastrocnemius 
muscle, which contributes no mechanical work by itself. A part of the energy generated by 
the knee extensors appears as mechanical work at the ankle joint and the height of the jump 
is significantly increased. This is because, as the jump proceeds and the knee straightens, the 
angular position changes of the knee have progressively less effect on vertical velocity of the 
jumper's centre of gravity. By gastrocnemius muscle activation, a rapid extension of the foot 
is produced. This extension has a greater effect on the vertical velocity than the extension of 
the almost straightened knee. The energy is more effectively translated into vertical velocity 
and a greater height of the jump is achieved. However, the timing of the gastrocnemius 
muscle activation is critical to obtain a maximum effect. This was demonstrated by an 
articulated physical model of the vertical jump by Bobbert et al (1986). 

Besides biarticularity, the gastrocnemius muscle has one more interesting feature. It is 
connected to the foot by an elastic tendon (see Fig. 1). The elasticity in the muscle fibers and 
tendons plays an important role in enhancing the effectiveness and the efficiency of human 
performance. An enhanced performance of human motion has been most effectively 
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demonstrated for jumping and running (Cavagna, 1970; Bobbert et al., 1996; Shorten, 1985). 
An important feature of elastic tissues is the ability to store elastic energy when stretched 
and to recoil this energy afterwards as a mechanical work (Asmussen and Bonde-Petersen, 
1974). Beside this feature, oscillatory movements performed at the natural frequency of 
muscle-tendon complex could maximize the performance. A countermovement vertical 
jump can be treated as one period of oscillatory movement and from this point of view the 
natural frequency, as well as parameters that define the natural frequency, can be 
determined. 

The purpose of our research was twofold. First to analyse the human vertical jump and to 
show that for each and every subject there exists an optimal triceps surae muscle-tendon 
complex stiffness that ensures the maximal possible height of the vertical jump. We defined 
the influence of the m. gastrocnemius activation timing and the m. gastrocnemius and 
Achilles tendon stiffness on the height of the vertical jump and established the methodology 
for analysis and evaluation of the vertical jump. We monitored kinematics, dynamics and m. 
gastrocnemius electrical activity during the maximum height countermovement jump of 
human subjects and measured viscoelastic properties of the m. gastrocnemius and Achilles 
tendon using the free- vibration technique. Based on the findings of the biomechanical study 
of the human vertical jump we performed a simulation study of the humanoid robot vertical 
jump. As a result of our research we propose a new human inspired structure of the lower 
extremity mechanism by which a humanoid robot would be able to efficiently perform fast 
movements such as running and jumping. 

2. Biorobotic Model of Vertical Jump 

Biorobotic model of the vertical jump consists of the dynamic model of the musculoskeletal 
system and of the mathematical model used for the motion control of the model. The results 
of the modelling are differential equations and a diagram for simulation and optimization of 
the vertical jump. 

2.1 Dynamic Model of Musculoskeletal System 

Vertical jump is an example of a movement that can be satisfactorily observed and 
analyzed in just a sagital plane. Therefore we built a model of the musculoskeletal system 
in a two dimensional space of the sagital plane. Because both lower extremities perform 
the same movement during the vertical jump, we joined both extremities in one extremity 
with three rigid body segments. Trunk, head and upper extremities were modeled as one 
rigid body with a common COG, mass and moment of inertia. The model of the 
musculoskeletal system is therefore a planar model composed of four segments that 
represent the foot, shank, thigh and trunk together with the head and both upper 
extremities. Segments of the model are joined together by frictionless rotational hinges 
whose axes are perpendicular to the sagital plane. The contact between the foot and the 
ground is modeled as a rotational joint between the tip of the foot and the ground. A 
model, whose foot is connected to the ground by a rotational joint, is applicable only for 
the push-off and landing phases of the vertical jump and is not applicable for the flight. 
As the motion of the COG during the flight is simply described and depends only on the 
speed vector of the COG just prior to the takeoff, this simplification does not present any 
limitations. 
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Fig. 2 shows the planar model of the musculoskeletal system, composed of four rigid bodies 
that represent the foot, shank, thigh and trunk together with the head and both upper 
extremities. The origin of the base coordinate system is in the center of the virtual joint that 
connects the foot with the ground. 




Fig. 2. Planar model of the musculoskeletal system 



Passive constraints in the hip, knee and ankle that define the range of motion of these joints 
were modeled as simple nonlinear springs (Audu and Davy, 1985; Davy and Audu, 1987). 
Fig. 2 also shows the model of the biarticular link that consists of the gastrocnemius muscle 
(GA) with stiffness kcA and damping b and the Achilles tendon (AT) with stiffness Icat- 
Contrary to the real gastrocnemius muscle, biarticular link can not contract. It can only be 
enabled or disabled at different moments during the push-off phase of the jump. High 
pennation angle of the gastrocnemius muscle fibers suggest that the predominant role of the 
m. gastrocnemius is not in its contraction but in its ability to bear high forces and to enable 
the energy transportation from the knee to the ankle (Bogert et al., 1989; Legreneur et al., 
1997). Therefore our simulated biarticular link that can not contract and can only be enabled 
or disabled is considered as an appropriate model for this study. 

Insertions of the biarticular link on the foot (B) and thigh (C) have been determined from the 
muscle data provided by Brand et al. (1982) and Delp (1990). 
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Vector of the force in the biarticular link/ is 

f = k-(BC-BC )-bBC, (1) 

where k represents the stiffness of the m. gastrocnemius and Achilles tendon connected in 
series, BC is the vector between the insertions of the biarticular link on the foot and thigh. 
BCo is the vector BC in the moment of the gastrocnemius muscle activation. Force in the 
biarticular link/ causes a torque in the ankle joint 

Qu2=-\\r B xfl (2) 

where rs is the moment arm vector from the center of the ankle joint to the insertion of the 
biarticular link on the foot and a torque in the knee joint 

Qm=\\r c xfl (3) 

where re is the moment arm vector from the center of the knee joint to the insertion of the 

biarticular link on the thigh. 

Motion of the musculoskeletal system is written with a system of dynamic equations of 

motion 

H(q)q + h(q, q) + G(q) = Q mov + Q pm + Q bl , (4) 

where Q v „ s is the vector of joint torques caused by the passive constraints and Qu is the 
vector of joint torques caused by the biarticular link. Qmov is the vector of joint torques 
caused by muscles and represents the input to the direct dynamic model of the 
musculoskeletal system. The output from the direct dynamic model is the vector of joint 
displacements q. We determined parameters of (4) H(q),h(q,q),G(q) using the equations 

published by Asada and Slotine (1986). Simulation diagram of the direct dynamic model of 
the musculoskeletal system is shown in the shaded rectangle of the Fig. 3. 

2.2 Motion Control 

Motion controller of the musculoskeletal system was designed to meet the following four 
requirements: 

1. Perpendicular projection of the body's COG on the ground coincides with the virtual joint 
that connects the foot with the ground during the entire push-off phase of the vertical jump. 
Therefore balance of the jumper and verticality of the jump is assured. Equation that 
describes this requirement is 

4M = o, (5) 

where Xj(t) is the distance between the desired perpendicular projection of the body's COG 
on the ground from the origin of the base coordinate system in time t. 

2. Motion controller assures the desired vertical movement of the body's COG relative to the 
ankle y TA (t) . By controlling the movement of the body's COG relative to the ankle, we 
excluded the influence of the biarticular link on the motion y TA (t) . Therefore parameters of 
the biarticular link can be varied and optimized for a certain desired motion of the body's 
COG relative to the ankle. 

3. Motion controller assures a constant angle of the foot relative to the ground qi before the 
biarticular link activation occurs. Thus the number of degrees of freedom of the model 
remains constant during the push-off phase of the vertical jump 
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4. In the moment, when the biarticular link activates, motion controller sets the torque 
in the ankle joint Q2 to zero and thus enable a free motion of the foot relative to the 
ground. By setting the torque in the ankle joint to zero, the motion in the ankle joint is 
only a function of the motion in the knee joint that is transmitted to the ankle by the 
biarticular link. 

Motion controller that considers the requirement (5) and enables the desired vertical motion 
of the COG relative to the ankle l/j A (i) in the base coordinate system is 




Vta+Va 



+ k„ 




Vta+Va 



+k. 




Vta+Va 



(6) 



where k p and kd are coefficients of the PD controller, x\- is the vector of the control 
acceleration of the COG in the base coordinate system, xja is the current height of the ankle 
joint relative to the ground, x T ,x T are the vectors of the current speed and position of the 
COG in the base coordinate system. 

The relation between the vector of the control speed of the COG in the base coordinate 
system x c T and the vector of the control angular velocities in the joints q c is 



: Jt<7c ' 



(7) 



where Jt is the Jacobian matrix of the COG speed in the base coordinate system. Equation (7) 
represents an under determined system of equations. From the requirement that the motion 
controller assures a constant angle of the foot relative to the ground qi before the biarticular 
link activation occurs, follows the condition 



la 



= 0. 



(8) 



An additional condition that abolishes the under determination of (7) is the relationship of 
the knee and hip joint angles 

4 c 4= W -4c3> ( 9 ) 

where n is the coefficient that describes the relationship. 

By substitution of (8) and (9) into (7) we get a relation between the vector of the control 
speed of the COG in the base coordinate system x c T and the vector of the control angular 
velocities in the ankle and knee joints q' c 

x c T =y T q' c , (10) 

where J^ is a new Jacobian matrix of the center of gravity speed in the base coordinate 
system. Differentiation of (10) with respect to time yields 



<7c3 



■■]' T -\x c T -]' T q) , 



where 



(11) 



(12) 



On the basis of conditions (8) and (9) and relation (11) we define control angular 
accelerations in all four joints 
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By substitution of (13) into a system of dynamic equations of motion (4) we get control 
torques in joints Q„ m , that we need to control the model of the musculoskeletal system 

Q„ t0V =H(q)ij c +h(q,c,) + G(q). (14) 

Direct dynamic model of the musculoskeletal system together with the motion controller 
compose the biorobotic model of the vertical jump. Simulation diagram for the simulation of 
the vertical jump is shown in Fig. 3. Inputs into the simulation diagram are the desired 
trajectory of COG relative to the ankle y* A and a signal for biarticular link activation a. 
Output from the simulation diagram is the vector of body's COG position Xj. 
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Fig. 3. Simulation diagram for the simulation of the vertical jump. Inputs into the simulation 
diagram are the desired trajectory of the COG relative to the ankle and a signal for 
biarticular link activation. Output from the simulation diagram is the vector of jumper's 
COG position. 



3. Biomechanical Analysis of Human Vertical Jump 

3.1 Subjects and Experimental Protocol 

Ten trained male subjects (age 26 ± 4 years, height 180.3 ± 6.56 cm, body mass 77.1 ± 7.24 kg) 
participated in the study. Informed consent was obtained from all of them. The protocol of 
the study was approved by the National Ethics Committee of the Republic of Slovenia. The 
experiments comply with the current laws of the Republic of Slovenia. 

After a warm-up procedure and three practice jumps, subjects subsequently performed four 
countermovement vertical jumps. They were instructed to keep their hands on the hips and 
to jump as high as possible. At the beginning of each vertical jump, subjects stood on their 
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toes and in the erected position. During each jump, position of anatomical landmarks over 
epicondylus lateralis and fifth metatarsophalangeal joint were monitored, ground reaction 
forces were measured and electromyogram of m. gastrocnemius was recorded. After the 
jumping, we determined viscoelastic properties of the triceps surae muscle tendon complex 
of all subjects. Details on methods and procedures are provided in the following sections. 

3.2 Anthropometric Measurements 

Segmental anthropometric parameters, such as masses, moments of inertia about the 
transverse axes, lengths and locations of the centers of gravity were estimated using 
regression equations (Zatsiorsky and Seluyanov 1983; Leva 1996). Position of the first 
metatarsophalangeal joint and the Achilles tendon insertion on the calcaneus were 
determined by palpation for each subject. Insertion of the m. gastrocnemius on femur was 
determined using the muscle data collected by Brand et al. (1982) and Delp (1990). 

3.3 Kinematics and Dynamics 

To determine the motion of the body's COG during the vertical jump we measured the 
vertical component of the ground reaction force caused by the subject during the jump. 
Subjects performed vertical jumps on a force platform (Kistler 9281CA) that is capable to 
measure ground reaction forces with the frequency of 1000 Hz. We zeroed the force 
platform just prior to the vertical jump when the subject was standing still in the erected 
position. Thus we enabled the precise determination of the subject's body mass. Body mass 
of the subject standing on the force platform is therefore equal to the quotient between the 
negative ground reaction force during the flight phase of the jump and the ground 
acceleration. The vertical position of the body's COG relative to the ground in time t y™(t) 
was obtained by double integrating with respect to time the vertical ground reaction force in 
time t F(t) 

y?(t) = — jJF(t)dtdt + y!?(0) (15) 

where m is the body mass of the subject and i/™(0) is the initial height of the body's COG 
relative to the ground. To determine the vertical position of the body's COG relative to the 
ankle in time t y™ A (t) we measured the motion of the ankle during the vertical jump by 

means of the contactless motion capture system (eMotion Smart). The vertical position of the 
body's COG relative to the ankle is therefore 

y? A (t)=ym-ym (16) 

where y m A (t) is the vertical position of the ankle in time t. 

3.4 Electromyography 

The activity of the m. gastrocnemius was recorded using a pair of surface electrodes put 
over the medial head of the m. gastrocnemius. Analog EMG signals were amplified and 
filtered with a band-pass filter with cut off frequencies at 1 Hz and 200 Hz. The signals were 
then digitalized with 1000 Hz sampling frequency and full- wave rectified. To reduce the 
variability of sampled EMG signal, the signal was then smoothed with a digital low-pass 
Butterworth filter. Finally the EMG signal was normalized with respect to the maximum 
value attained during the vertical jump. 
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3.5 Measurements of Muscle-Tendon Viscoelastic Properties 

Triceps surae muscle- tendon complex viscoelastic properties of both legs were measured for 
each subject using the free-vibration method described by Babic and Lenarcic (2004). The 
measurement device and the procedure have been designed in such a manner that as few 
human body segments move as possible during the measurement. Thus the measurement 
uncertainty due to the approximation of the properties of the human body segments was 
minimized. The results of the measurements are the elastic stiffness kcA and viscosity b of 
the m. gastrocnemius and the elastic stiffness Icat of the Achilles tendon. 

3.6 Treatment of Data 

For the purposes of analysis and optimization of the vertical jump we adjusted the 
biomechanical model of the musculoskeletal system with segmental anthropometric 
parameters, such as masses, moments of inertia about the transverse axes, lengths and 
locations of the centers of gravity of each subject. Parameters of the biarticular link, such as 
insertion of the m. gastrocnemius on femur, insertion of the Achilles tendon on calcaneus, 
elastic stiffness and viscosity were adjusted to match the measured parameters of each 
subject. 

To simulate the vertical jump of the individual subject we used the measured trajectory 
of the body's COG as the input into the biomechanical model of the vertical jump. 
Biarticular link activation that is also an input into the biomechanical model of the 
vertical jump was determined from the EMG signal of the m. gastrocnemius. The 
moment of biarticular link activation was determined as the moment when the rectified, 
normalized and filtered EMG signal of the m. gastrocnemius increased to 95% of its 
maximum value. After the activation, the biarticular link remains active during the 
entire push-off phase of the jump. 

3.7 Results 

To determine the optimal timing of the biarticular link activation that results in the highest 
vertical jump, a series of the countermovement vertical jump simulations have been 
performed for each subject. Each simulation was performed with a different timing of the 
biarticular link activation. 

All subjects activated their m. gastrocnemius slightly before the optimal moment, 
determined by means of simulations. In average, the difference between the optimal 
and measured knee angle when the m. gastrocnemius was activated was 6.4 ± 2.22°. 
Because the dynamic model of the musculoskeletal system does not include the 
monoarticular muscle soleus, the measured heights of the jumps were higher than the 
jump heights determined with the simulations for 4.3 ±1.12 % in average. The primary 
motive for omitting the m. soleus from the modelling is that we wanted to control the 
motion of the body's COG relative to the ankle so that the parameters of the 
biarticular link could be varied and optimized for a certain measured motion of the 
subject's COG relative to the ankle. If the dynamic model of the musculoskeletal 
system would include the m. soleus, the motion of the ankle would be fully 
determined by the force of the m. soleus and we would not be able to control it with 
regard to the desired body's COG relative to the ankle. Moreover if the dynamic 
model of the musculoskeletal system would include the m. soleus, force produced by 
the m. soleus would be another input into the biomechanical model of the vertical 
jump and we would have to accurately measure the force produced by the m. soleus 
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of subjects performing the vertical jump. An additional cause for the differences 
between the measured heights of the jumps and the jump heights determined by 
means of simulations can be the simplified model of the foot that we modeled as one 
rigid body. The arch of the foot is linked up by elastic ligaments that can store elastic 
energy when deformed and later reutilize it as the mechanical work (Alexander, 
1988). Ker et al. (1987) measured the deformation of the foot during running and 
determined the amount of the energy stored during the deformation. They showed 
that the elasticity of the foot significantly contribute to the efficiency of the human 
movement. To be able to compare the measurements and the simulation results, we 
corrected the simulation results by adding the contribution of the m. soleus to the 
height of the vertical jump. Such corrected heights of the vertical jumps at the optimal 
moments of m. gastrocnemius activation are insignificantly larger than the measured 
heights of the vertical jumps for all subjects. In average the height difference is only 
1.6 ±0.74 cm. 

To determine the optimal stiffness of the Achilles tendon regarding to the height of the 
vertical jump, a series of the countermovement vertical jump simulations have been 
performed, each with a different stiffness of the Achilles tendon. Optimal timing of the 
biarticular link has been determined for each stiffness of the Achilles tendon as described in 
the previous paragraph. The measured values of the Achilles tendon stiffness for all subjects 
were always higher than the optimal values determined by means of simulations. By 
considering the elastic properties of the arch of the foot, we can assume that the optimal 
values of the Achilles tendon stiffness would increase and therefore the differences between 
the optimal and measured values would decrease. 

Results of the measurements, simulations and optimizations of the human vertical jumps 
are presented in Fig. 4. Subjects are arranged with regard to the ratio between the optimal 
stiffness of the Achilles tendon determined by means of simulations and the measured 
stiffness of the Achilles tendon. If we want to evaluate the contribution of the viscoelastic 
properties to the height of the jump, ranking of the subjects with regard to the height of the 
jump is not appropriate because the main parameter that influences the height of the vertical 
jump is the power generated by muscles during the push-off phase of the jump. The 
elasticity of the Achilles tendon has the secondary influence on the height of the jump. 
Results show that for the same power generated by an individual subject during the push- 
off phase of the jump, the height of the vertical jump varies with the Achilles tendon 
stiffness. Therefore the appropriate criterion for ranking the subjects have to consider the 
elasticity of the Achilles tendon. 

4. Simulation and Optimization of Vertical Jump 

4.1 Simulation Results 

Simulations of the counter movement vertical jumps were performed. At the beginning of 
the simulation the legs were fully straightened. A counter movement was performed until 
the desired minimal knee angle was reached. The flexion of the joints was such that the 
centre of gravity of the robot dropped with the acceleration of the gravity. Lifting of the foot 
from the floor would occur if faster flexions would be applied. Flexion was immediately 
followed by a rapid extension which provided the energy for the jump. An example 
trajectory of the robot center of gravity during the countermovement vertical jump is shown 
in Fig. 5. 
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Subject 
Fig. 4. Results of the measurements, simulations and optimizations of the vertical jumps. 
Whole bars represent the maximal heights of the vertical jumps achieved with the optimal 
values of the Achilles tendon stiffness and determined by means of simulations. The dark 
shaded parts of the bars represent the measured heights of the vertical jumps while the light 
shaded tops represent the differences between the maximal and measured heights of the 
vertical jumps. The differences are also shown as percentages relative to the measured 
heights. Bold line represents the ratio between the optimal stiffness of the Achilles tendon 
determined by means of simulations and the measured stiffness of the Achilles tendon for 
each subject. 
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Fig. 5. An example trajectory of the robot center of gravity during the countermovement 
vertical jump. Time interval between Os and fc represents the push-off phase of the jump 
while the time interval between fe and t$ represents the flight phase of the jump, ti is the 
time when robot center of gravity starts to decelerate in the downward direction. 
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The height of the vertical jump can be calculated with the following equation 

yr„»x=1/TA( f 2) + l/A( f 2)-^-(l/TA( f 2) + l/A( f 2)) 2 - ( 17 ) 

2 S 

As evident from (17), there are four parameters whose values in time tj influence the height 

of the vertical jump: 

the height of the robot center of gravity relative to the ankle joint y TA (t 2 ) , 

the height of the ankle joint relative to the ground y A (t 2 ) , 

the velocity of the robot center of gravity relative to the ankle joint y TA (t 2 ) and 

the velocity of the ankle joint relative to the ground in the vertical direction y A (t 2 ) ■ 

The height and the velocity of the robot center of gravity relative to the ankle joint in time ti 
depend only on the power of the actuators in the hip and knee joints while the height and 
the velocity of the ankle joint relative to the ground in time tz is the consequence of the feet 
motion dictated by the biarticular link. Therefore the height and the velocity of the ankle 
joint relative to the ground in time ti depend on the biomechanical parameters of the 
biarticular link and its activity during the push-off. 



4.2 Optimization of Vertical Jump 

To determine the optimum timing of the biarticular link activation that results in the highest 
vertical jump, a series of the countermovement vertical jump simulations have been 
performed, each with a different timing of the biarticular link activation. The influence of 
the biarticular link elasticity was eliminated by increasing the biarticular link stiffness 
towards the infinity. Thus we achieved a stiff biarticular link with no elastic properties. The 
relationship of the jump height and the biarticular link activation moment is presented in 
Fig. 6. 
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Fig. 6. Jump height as a function of the knee angle when the biarticular link is activated. 

Biarticular link activation moment is presented with the knee angle when the biarticular link 
is activated. The highest jump has been achieved when the biarticular link was activated in 
the moment when the knee angle was approximately -84°. With this optimal timing, the 
robot jumped almost twice a high as when the biarticular link was not actuated or was 
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actuated when the knee was almost fully extended. If the biarticular link activates too early 
in the push-off phase, the robot looses its ground contact too early when its center of gravity 
has not reached the maximal velocity. If the biarticular link activates too late in the push-off 
phase, the rotational energy of the thigh and shank increases uselessly. The optimal timing 
represents a compromise between these two cases. 
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Fig. 7. Jump height as a function of the biarticular link stiffness. 
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To determine the optimum stiffness of the biarticular link regarding to the height of the 
vertical jump, a series of the countermovement vertical jump simulations have been 
performed, each with a different stiffness of the biarticular link. For each and every stiffness 
of the biarticular link the optimum timing has been determined as described in the previous 
paragraph. The relationship of the jump height and the biarticular link stiffness is presented 
in Fig. 7. 

The highest jump has been achieved when the stiffness of the elastic biarticular link was 
approximately 245 kN/m. The highest jump was approximately 16% higher than the highest 
jump with the stiff biarticular link. We can thus conclude that the reutilization of the elastic 
energy stored during the countermovement contributes 16% to the height of the vertical 
jump. 



4.3 Sensitivity Analysis 

The purpose of the sensitivity analysis described in this section is to determine the 
sensitivity of the vertical jump height to the variations of the biarticular link parameters. We 
performed local sensitivity analysis that refers to variations of parameter around the values 
that ensure the maximal possible height of the vertical jump. These optimal values were 
determined with the optimization methods as described in the previous section. Changes of 
the vertical jump height that result from variations of biarticular link stiffness k, biarticular 
link damping b and ratio between the moment arms of the biarticular link tq/tb are 
presented in Fig. 8. Diagram in Fig. 8 also shows whether a certain parameter change results 
in an increase or in a decrease of the vertical jump height. 
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Fig. 8. Changes of vertical jump height in relation with variations of biarticular link stiffness 
k, biarticular link damping b and ratio between the moment arms of the biarticular link rc/rg. 
Multiplicator represents relative changes of parameter values. 

As the sensitivity analysis refers to the model with optimal biarticular link stiffness, both 
positive and negative change of the biarticular link stiffness k cause a decrease of the jump 
height. An increase of the ratio between the moment arms of the biarticular link tc/te causes 
an increase of the jump height. Higher ratio between the moment arms means higher 
angular speed in the ankle joint as a result or the biarticular link activation. Beside the 
increase of the angular speed in the ankle joint, also the force in the biarticular link 
increases. Variations of the biarticular link damping b only have negligible effect on the 
height of the vertical jump. 



5. Discussion 

By building an efficient biorobotic model which includes an elastic model of the biarticular 
muscle gastrocnemius and by simulation of the vertical jump we have demonstrated that 
biarticular links contribute a great deal to the performance of the vertical jump. Besides, we 
have shown that timing of the biarticular link activation and stiffness of the biarticular link 
considerably influence the height of the jump. 

Methodology and results of our study offer an effective tool for the design of the humanoid 
robot capable of performing vertical jumps. We propose a new human inspired structure of 
the lower extremity mechanism that includes an elastic biarticular link and by which a 
humanoid robot would be able to efficiently perform fast movements such as running and 
jumping. However, it has to be considered that this study deals only with one biarticular 
link. Although the biarticular link we included in our study has the most distinctive elastic 
properties among all biarticular muscles of the human leg, other biarticular muscles such as 
the recuts femoris and the hamstrings should be included in humanoid robot design. A 
special challenge would be to design a humanoid lower extremity that includes all 
biarticular muscles of the human lower extremity and to demonstrate their joint effect on the 
vertical jump performance. 
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1. Introduction 

The potential market of service and entertainment humanoid robots has attracted a great 
amount of research interests in the recent years. Several models of humanoid robots have 
been designed in research projects. However, it remains a great challenge to make 
humanoid robots move autonomously. Motion planning is one of the key capabilities that 
an autonomous robot should have. An autonomous robot should be able to accept high- 
level commands and move in a real-life environment without colliding with environmental 
obstacles. A high-level command is something like "Move to location A on the second floor" 
while the robot is currently at some location B on the first floor, for example. It is an 
interesting problem for humanoid robots since locomotion capability possessed by a 
humanoid robot is usually much better than a mobile robot. Like a human, a humanoid 
robot should be able to step upstairs or downstairs and stride over small obstacles or a 
narrow deep gap in a complex environment such as the one shown in Fig. 1. Similar needs 
for the planning capability also arise in the domain of computer animation in generating 
motions for autonomous characters. 

The motion for a humanoid robot to achieve a given goal is typically very complex because 
of the degrees of freedom involved and the contact constraint that needs to be maintained. 
Therefore, it is common to take a two-level planning approach to solve this problem. In our 
previous work (Li et al, 2003), we have been able to plan efficient humanoid walking 
motions in a layered environment. The approach used in the planner decomposed the 
planning problem into subproblems of global and local planning, each of which is easier to 
solve. The global planner assumes some basic properties of a humanoid and uses an 
approximated geometric shape to define the path planning problem. The path generated by 
the global planner is then passed to the local planner to realize the path with appropriate 
walk motions. However, in previous work, the locomotion of a humanoid is usually limited 
to forward walking only, and a humanoid cannot pass a deep gap even if it can stride over 
it. In this chapter, we will describe a motion planning system adopting the two-level 
approach and extending the work to overcome the above two limitations. We will present 
an efficient implementation of the planner in terms of space and time such that it can be 
used in an on-line manner. 

The rest of the chapter will be organized as follows. In the next section, we will review the 
researches pertaining to our work. In the third section, we will give a formal description of 
the problem we consider in this chapter. We will then propose the planning algorithm in the 
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fourth section and present some implementation details and experimental results in the fifth 
section. In the last section, we will conclude the chapter with some future directions. 




Fig.l. Example of a layered virtual environment with deep narrow gaps (colored columns 
between two large platforms) 



2. Related Work 

Motion planning problems have been studied for more than three decades. An overview of 
motion planning algorithms can be found in Latombe's book (Latombe, 1991). The 
researches pertaining to humanoid motion planning or simulation can be found in the fields 
of robotics and computer animation (Kuffner et al., 2001; Polland et al., 2002; Sun & Dimitris, 
2001). These researches differ mainly on the way they perform global path planning and 
how locomotion is taken into account. For example, early work in computer animation 
focused on generating human walking motion to achieve a high-level goal (Bruderlin & 
Calvert, 1989). However, no planning was done to generate the global path automatically. 
Since the application was mainly for computer graphics, how to simulate human walking 
with realistic looking was the main concern (Sun & Dimitris, 2001). A dynamic filtering 
algorithm was proposed in (Yamane & Nakamura, 2000) to ensure that the generated 
motions could be transformed into a dynamically feasible one. 

In an early paper by Kuffner (1998), a gross motion planner utilizing graphics hardware was 
proposed to generate humanoid body motions on a flat ground in real time. Captured 
locomotion was used in this case to move the humanoid along the generated global path. In 
(Kalisiak & Panne, 2001), a stochastic search approach with versatile locomotion was 
proposed to generate humanoid motions in an unstructured environment where a set of 
predefined grasp points served as contact constraints. In (Choi et al, 2003), sequences of 
valid footprints were searched through augmented probabilistic roadmap with a posture 
transition graph, and then each pairs of footprints were substituted by corresponding 
motion clips. Since the motion clips were captured in advance, the locomotion may not be 
flexibly adapted to uneven terrain to avoid collisions. In (Shiller et al, 2002), a multi-layer 
grid was used to represent the configuration space for a humanoid with different types of 
locomotion such as walking and crawling, and the humanoid may change its posture along 
the global path. 

In (Pettre et al., 2003), a digital actor was modelled with active (all degrees of freedom 
attached to the legs) and reactive (attached to the upper parts of the body) degrees of 
freedom. A collision-free trajectory was computed for the active part of the digital actor. 
Then the motion warping technique (Witkin & Poppvic, 1995) was applied to the motion 
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capture data along the path when the reactive part of the digital actor collided with 
obstacles. The planner proposed in (Chestnutt et al., 2003) evaluated footstep locations for 
viability using a collection of heuristic metrics of relative safety, effort required, and overall 
motion complexity. At each iteration of the search loop, a feasible footstep was selected from 
the footstep transition sets. The global path of this approach was a sequence of footstep 
locations toward a given goal state. 

In (Kuffner et al., 2001a; Kuffner et al., 2001b), a humanoid robot with real-time vision and 
collision detection abilities was presented. The robot could plan its footsteps amongst 
obstacles but could not step onto them. Considering locomotion directly in global path 
planning may generate more complete result but, on the other hand, it limits the flexibility 
of locomotion. In our previous work (Li et al., 2003), we were able to plan humanoid 
motions in real time with a given general description of the objects in the workspace. A 
more detailed description of this approach will be given in the next two sections. 

3. Problem Description 

3.1 Problem Overview 

According to motion granularity, the motion-planning problem usually can be classified 
into global (gross) motion planning and local (fine) motion planning. In the global motion- 
planning problem we concern with working out the body logistics, such as planning a 
collision-free path amongst trees in a forest to reach some destination. In contrast, for the 
local motion-planning problem we focus more on limb logistics, such as planning hand 
motion to grab an awkwardly placed object. For the problem of walking on a layered 
complex environment for a humanoid robot, both types of planning needs to be considered 
in order to ensure that the desired task can be accomplished with feasible motion plans. 
The inputs to a typical motion planner for a humanoid robot include the initial and goal 
configurations of the robot, the kinematics description and locomotion abilities of the robot, 
and a geometric description of objects in the environment. In our approach, the planning 
problem is decomposed into two subproblems: the global motion planning for moving the 
body trunk of a humanoid and the local motion planning for realizing the global motion 
plan with the chosen locomotion. The planners at the two levels can be linked together to 
solve the problem in sequence as well as to feed back failure situation for further replanning 
as shown in Fig. 2. The global motion planning will be the main concern of this chapter. 
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Fig.2. Planning loop for a typical query of humanoid motion. 
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3.2 Modeling the Humanoid 

The kinematics description of a humanoid robot includes the maximal gait size, the maximal 
step height (h), and the bounding cylinder for each type of locomotion that the robot can 
adopt. Unlike traditional motion planning problems where the obstacles are given explicitly, 
the definition of obstacles in our case depends on the kinematics properties of the humanoid 
as well as the geometry of the objects. For example, an object is an obstacle to a humanoid if 
there is no way for the humanoid to step onto or pass under the object for a given 
locomotion due to the height and leg length of the humanoid. 

In the global motion planning for a humanoid, we assume that the humanoid can be 
modeled as a bounding cylinder to simplify collision detection. In our system, we uses a 
large and a small cylinders to model a humanoid for frontal and lateral walking, 
respectively, as shown in Fig. 3. The radius of the inner cylinder is the size of the minimal 
region for a stable stance. The radius of the outer cylinder is determined according to the 
type of locomotion. We use the largest lateral width orthogonal to the moving direction of 
the humanoid to determine the radius. For example, the cylinder used in planning the side- 
walk motion is smaller than the one for regular walking motion. The height of the cylinders, 
denoted by H, is also related to the height and locomotion type of the humanoid. If we allow 
the humanoid to bend its upper body during the walking cycle, the actual height may be 
lower than the height of the humanoid. When using cylinders to model the geometry of a 
humanoid, we actually ignore its orientation at the planning time. We assume that we can 
recover the orientation of a humanoid in a postprocessig step according to the generated 
path and the adopted locomotion. 

We assume that a humanoid robot can perform several types of locomotion and choose the 
most appropriate one according to the environment. Although many types of locomotion 
can be considered and implemented, only frontal walking, side walking, and jumping are 
demonstrated in this work. In addition, we assume that the local planner can generate 
necessary motion transition from one type of locomotion to another at a given configuration 
along a path. 
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Fig.3. Frontal and lateral walking model for the humanoid. 
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4. Motion Planning for Humanoid 

As described in the previous section, we use a decoupled approach to reduce the complexity 
of the motion-planning problem. In this section, we will focus on addressing the global path- 
planning problem for a humanoid with multiple types of locomotion including striding. We 
will first describe how we model the environment and prepare the search space according to 
the given environmental description and the data of a humanoid. Then we will present the 
planning algorithm that is used to search for a feasible path with these abilities. 

4.1 Modeling the Environment 

We assume that we are given the geometric description of the objects in the workspace such 
as the one shown in Fig. 4(a) and 4(b). Each object is described by a height and an offset 
from a reference level (such as the ground) in addition to its polygonal description. 
According to this geometric description, one can discretize the workspace into a grid of cells 
with an appropriate resolution. The resolution is chosen such that the area of a cell can allow 
a foot of the humanoid to step onto. Each cell in the workspace is given an offset value 
representing the distance from the bottom of an object to the referenced ground. Each cell is 
also assigned a height value above the offset. According to the offset values of the objects, 
we separate the 3D workspace into multiple connected 2D layers. For each layer, we 
compute a height map containing the elevation value of each cell above the layer in the 
workspace grid as shown in Fig. 4(c) and 4(d). Each cell in the layer / is referred to the same 
offset value d representing the distance from the base of the layer / to the referenced 
ground. Each cell i is also assigned a height value a \ above the layer /. The cell i of the 

height map for the layer I is represented as c\ = a\ ■ Therefore, in a layered environment, a 
point may have different offset and height values on these different layers. 
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Fig. 4. (a) Top view of the workspace, (b) side view of the workspace, (c) and (d) are the 
height maps (the darker, the higher), (e) and (f) are the reachability maps of first and second 
layers (reachable regions in white). 
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The layered representation approach is appropriate for the gross motion-planning problem 
of a humanoid robot since the humanoid needs to stay on the ground all the time, even 
when climbing up or stepping down stairs. Therefore, a configuration q in workspace can be 
represented as (x, y, I), which means that the humanoid is standing at position (x, y) and on 
the "ground" of layer /. 



4.2 Basic Reachable Region 

Given an initial configuration of the humanoid, we compute a map, called reachability map, to 
represent the reachable regions from the initial configuration. The unreachable cells are 
marked as obstacle regions in this map. A reachability map may consist of several slices with 
one slice for each layer. For example, Fig. 4(e) and 4(f) show the reachability map for the two 
layers of the scene in Fig. 4(a). The black regions are the unreachable regions marked as 
obstacles. These slices could be "connected" if there exists an object whose height is large 
enough to bring the humanoid to step onto some neighboring cells of the above or below layer. 
We compute this map by a wave propagation algorithm similar to the one used to construct 
NF1 potential fields (Latombe, 1991). Suppose that the current cell under propagation is i. The 
algorithm advances to a neighboring cell i at the layer I or a neighboring layer / only if the 
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Fig.5. (a) height map for the environment in Fig. 1., (b) map in (a) after applying dilation, (c) 
map in (b) after applying erosion. After the closing operator, the columns in the 
discontinuous region are connected, (d) instability map 
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4.3 Reachable Region for Striding Ability 

For a humanoid with versatile motion abilities, the reachable region in the workspace is 
affected by several factors. For the walking motion, we need to take the maximal gait size 
and leg length of the humanoid as well as the height of the bounding cylinder into account. 
In the previous subsection, we have described a method to compute the reachability map for 
an environment with several layers. However, for an environment with deep narrow gaps 
such as the one shown in Fig. 1, large height differences between neighbouring cells prevent 
a humanoid to move across the region occupied by the columns. The main difficulty comes 
from the fact that we only check the neighbours that are one cell away although the maximal 
gait size of the humanoid could occupy several cells. 

A straightforward approach to overcome this problem is extending the search algorithm to 
visit all the neighbours that are a few cells away from the current configuration and take the 
legal ones for further processing. An obvious drawback of this approach is that the number 
of neighbours increases rapidly as the gait size increases. The planning time will suffer as 
the number of visited cells soars. In addition, if we do not have a good idea about the 
reachable region, it would be difficult to compute an effective potential field to guild the 
search. 

In this work, we propose a method for reachability computation that can account for the 
striding ability of a humanoid to handle the discontinuous regions mentioned above. The 
idea is that we can connect together the separated regions within the gait range by using the 
"Closing" operator, an important morphological operator in image processing (Gonzale & 
Woods, 2002). Closing tends to smooth sections of the image. It generally fuses narrow 
breaks and long thin gulfs, eliminates small holes, and fills gaps in the image. Closing of set 
A by a structuring element B, denoted A* B, is defined as: 

A • B = (A © B)QB (1) 

The equation, in words, says that the closing of A by B is simply the dilation of A by B, followed 
by the erosion of the result by B. The height map is A in our case, and we select the circle with a 
radius of half of gait size as the structuring element B. The dilation operation (A ®B) will 
extend the edge of objects in bitmap by 1/2 gait size, which fills small gaps or holes between two 
disconnect regions if these regions are near. After dilation, the resulting bitmap shows that two 
regions are connected if their distance is within one foot step (See Fig. 5(b)). But for the extended 
edges which do not connect to other regions, we apply erosion ( AQB ) to eliminate these useless 
edges. The erosion operation will corrode the edges in bitmap by 1/2 gait size, which dose not 
break the connected regions. Fig. 5(c) shows the result after erosion. Note that the unnecessary 
edges are removed after the operation and the map gets back to its original shape as in Fig. 5(a) 
except for the small discontinuous regions that are connected now. This modified height map 
after the closing operation is then used to determine the reachability of each cell for the walking 
motion with the method described in the previous subsection. 

Since we have more than one type of locomotion available for the humanoid to use, we will 
need to compute a reachability map for each type of locomotion according to the parameters 
of the humanoid such as the step size and the size of the bounding cylinder for that type of 
locomotion. We can then take the union of these reachability maps to form the final 
reachability map. In other words, a cell is reachable if there exists at least one type of 
locomotion that can bring the humanoid to the cell. We record the available locomotion 
types for each cell such that we know how to choose an appropriate motion later in the 
search process. 



574 Humanoid Robots, New Developments 

4.4 Instability Map 

In order to know which part of the region is collision-free, we can convert the height map, 
built in the workspace, into the corresponding C-space by growing the obstacle regions with 
the radius of the bounding cylinder of the humanoid for a given locomotion. However, not 
all collision-free regions allow the humanoid to stay for long. For example, we may allow 
the humanoid to walk on a stair for trespassing purposes but do not want it to stay at the 
edge of a stair for too long. We need to identify these regions where unstable situation might 
occur. A map describing the regions is called instability map (see Fig. 5(d) for an example). A 
cell in the instability map is defined as unstable if and only if the region covered by the 
enclosing circle of a humanoid contains cells with different heights and this height 
difference is smaller than the maximal step height of the humanoid. Otherwise, if this region 
contains an object whose height difference is larger than the maximal step height of the 
humanoid, we will consider the cell a forbidden cell. Similarly, a cell is defined as gap if and 
only if the region in the height map is reachable after the closing operation. A humanoid can 
enter the unstable or gap regions for trespassing purpose but the transition time usually 
needs to be short. Therefore, we have set an upper bound for each type of these regions in 
the search process to prevent the humanoid from staying in these regions for too long. 

STABLEBFPQ 

1 install qt in T; 

2 INSERT^,, OPEN); mark q { visited; 

3 SUCCESS <- false; 

4 while -| EMPTY (OPEN) and -, SUCCESS do 

5 ^FIRST(OPEN); 

6 for every neighbor q' of q in the grid do 

7 if q ' is stable then 

8 mark q' visited; 

9 if LEGAL((j', g) then 

10 install q' in T with a pointer toward q; 

11 INSERT^', OPEN); 

12 if q' = q g then SUCCESS <- true; 

13 if SUCCESS then 

14 return the backtracked feasible path 

15 else return failure; 

Fig.6. The STABLE_BFP algorithm 

4.5 Path Planning Algorithm 

The planning algorithm that we use to compute the humanoid motion is shown Fig. 6. The 
STABLE_BFP algorithm is similar to the classical Best-First Planning algorithm (Barraquand 
& Latombe, 1991) that was used to solve path-planning problems with low DOF's. In each 
iteration of the search loop, we use the FIRST operation to select the most promising 
configuration q from the list of candidates (OPEN) for further exploration. We visit each 
neighbor q' of q, check their validity (via the LEGAL operation) and maintain their stability 
(through STABLE operation) for further consideration. The number of neighbors is related 
to the number of major locomotion, n. That is, if we visited m neighbors for each locomotion, 
we have to visited m*n neighbors in each iteration. The procedures for checking stability and 
legality are shown in Fig. 7. A configuration is legal if it is collision-free, marked unvisited, 
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and temporarily stable. It is temporarily stable if and only if the humanoid has not entered 
the unstable regions or the gap regions for longer than some threshold. This duration is kept 
as an instability counter (q'.cnt) in each cell in the unstable region and step counter (q'.step) 
in the gap region when we propagate nodes into it. Note that the validity of a configuration 
in the unstable or gap regions depends on the counter of its parent configuration. If there are 
more than one possible parent configurations, we cannot exclude any of them. Therefore, in 
the STABLE_BFP algorithm, we do not mark a configuration visited if it is in the unstable or 
gap region. A configuration in these regions can be visited multiple times as long as the 
counter does not exceed the maximal bound. In a gap region, we need to check the gap 
width with the gait size. We keep the start point of a gap (gap_begin) when we first enter the 
gap region and the end point of a gap (gap_end) when a stable region is reached. These two 
points are used to compute the gap width, and the procedure is to ensure that the gap does 
not exceed the gait size and the humanoid can stride from gap_begin to gap_end. 

STABLE (q',q) 

if q is unstable then 

^'.cnt= g.cnt+1; 

else if q is gap then 

if g.ingap then 

<7'.step= g.step+1; 
else 

<7'.step=l; 
<7'.ingap=true; 
gap_begin= q- 
else if q is stable and g.ingap then 
gap_end= q'; 

LEGAL (q',q) 
STABLE(q',q) 
if q' is visited or forbidden then 

return false; 
if q is gap and g'.step>N then 

return false; 
if q' is unstable and g'.cnt>M then 

return false; 
if q' is stable and g.ingap then 
check distance between gap_begin and gap_end; 
if the distance is large than gait size, then 
return false; 
return true; 
Fig. 7. The STABLE and LEGAL procedures. 

In the STABLE_BFP algorithm, we use the FIRST operation to select the most promising 
configuration for further exploration. In general BFP planners, the artificial potential field is 
usually the only index for the goodness criterium. Planners with this approach can usually 
yield short paths. In our case, the height difference could be an important index as well since 
one may prefer climbing up or stepping down stairs to taking a longer path. Preference on 
each available locomotion is also an important factor. For example, generally speaking, we 
prefer walking to crawling. Therefore, in the FIRST operation, we use a linear combination 
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of these criteria, whose weights are specified by the user. In general, unstable regions and 
gap regions have lower preference, and staying in these regions is not preferred. Therefore, 
we use instability (q.cnt) or step counter (q.step) as a penalty measurement to avoid motions 
over difficult areas whenever possible. 

5. Implementation and Experiments 

We have implemented the global and local motion planners in Java and connected the 
planners to a VRML browser to display the final simulation results. All the planning times 
reported below are taken from experiments run on a regular 1.6GHz PC. The size of the 
workspace is 25.6m x 25.6m, and the height and gait size of the humanoid is 180cm and 
60cm, respectively. The width of the shoulder (for enclosing cylinder) and the foot length is 
60cm and 37cm, respectively. The resolutions for the grid workspace and configuration 
space are all 256x256 in the global planner. In the following subsection, we will describe 
how to optimize storage space by merging several layers with sparse objects in them to 
fewer layers. We will also use two examples to demonstrate the ability and efficiency of the 
planner in later subsections. 

5.1 Merging Layers with Sparse Objects 

In the proposed system, we use the offset value of an object to separate the workspace into 
several 2D grid layers. However, for workspace like gyratory stairs, such as the one shown 
in Fig. 8, each stair may have a unique offset value, and we need a layer for each offset. The 
result is that each of the layered maps only contains sparse objects. As the number of layers 
increases, not only the storage space will increase, but the search performance will degrade 
as well. In our implementation, we first sort the objects in workspace by their offsets in 
ascending order. Then we add objects into the first layer one by one until an object 
overlapped with other objects in this layer. A new layer is then created on demand. After 
this process, each object is assigned to a specified layer, and the resulting number of merged 
layers is usually much smaller. Fig. 8 is the example of gyratory stairs, which need 25 layers 
if we use offsets to separate the workspace directly. If we use the merge approach 
mentioned above, only two layers are needed to represent the workspace. 

5.2 Example 

In Fig. 9, we use an example to illustrate the features of this planning system. The 
environment, which is the same as the one in Fig. 1, consists of two layers of objects with 
various sizes and heights scattered on the two main platforms connected by several columns 
of various heights. The C-obstacles (configuration space obstacles) with different bounding 
cylinders for forward walking and side walking are shown in Fig. 9 (a) and (b), respectively. 
Note that the narrow passage exists only when side-walk locomotion (Fig. 9. (b)) is used. 
The global path found by the planner is shown in Fig. 9 (c) and (d). In this example, the 
initial configuration is on the ground, and the goal is at some location on the second layer 
that is reachable only through the following passages: climbing upstairs to the left platform, 
changing locomotion to side walk to pass the narrow passage, crossing the columns to the 
right platform, and climbing upstairs again to the second layer on the right platform. The 
planning time for a typical run consists of two parts: preprocessing and search. In this 
example, the total preprocessing time is 251ms (constructing layer map takes 10ms, 
computing the reachable region takes 35ms, computing the potential field takes 74ms, and 
computing the instability map takes 132ms) and the search time is 15ms. 
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(c) (d) 

Fig. 8. Merging sparse layers, (a) and (b) are the 3D workspace with spiral stairs from 
different views (c) and (d) are the merged results for layer 1 and layer 2, respectively. 
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(e) (f) 

Fig. 9. (a) the C-obstacles for forward walking and (b) side walking, (c) the search result of 
the global path planner on layer 1 and (d) layer 2, (e) passing the narrow passage with side- 
walk motion, (c) feasible local motions generated to walk across the columns. 

In Fig. 10, we use an outdoor environment to illustrate the use of several types of 
locomotion. The environment, as shown in Fig. 10(a), is partitioned into three blocks by 
the river. Upper and lower blocks are connected through the lily pads floating on the 
river. The broken bridge between the left and the right block is the shortest path 
between them. However, the distance is too large for the humanoid to stride over. In 
this case, the humanoid can only use jumping to move over the broken bridge. In 
addition, the passage to the house is too narrow for the humanoid to walk through 
without switching to the side-walk locomotion because the narrow passage exists only 
when side-walk locomotion is used. The global path found by the planner is shown in 
Fig. 10 (b), the red portion of the path is frontal walking, yellow portion is jumping, and 
green portion is lateral walking. In this example, the initial configuration is on the left 
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bottom of the lower block, and the goal is in the yard of the house at the center of the 
right block that is reachable only through the following passages: climbing upstairs to 
the platform, detouring round the woods in the forest (Fig. 10(c)), striding over the lily 
pads (Fig. 10(d)), jumping over the broken bridge between the left and the right block 
(Fig. 10(e)), changing locomotion to side walk to pass the narrow fence (Fig. 10(f)), and 
finally reach the goal. The total planning time is 312ms (281ms for preprocessing and 
31ms for path searching). 




(e) (f) 

Fig. 10. An outdoor environment requiring the humanoid to use several motion skills to 
reach the goal. 
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5.3 Planning Performance Compared to the Traditional Approach 

As mentioned in the previous sections, traditional straightforward search algorithms 
may also be able to find paths with the striding ability by extending the range of the 
neighbor search. Compared to this traditional approach, our approach is superior in 
performance and extensibility. Assume that a normal walking gait size is 60cm, which is 
equal to 6 cells in search space with the current resolution. In the traditional approach, 
the number of neighbors that needs to be covered is 64 for each configuration. This 
number could be even larger if we allow the humanoid to have a larger gait size or if we 
use finer resolution for the search space. The search time for an example similar to the 
one shown in Fig. 9 is 1246ms with the traditional approach where 64 neighbors are 
visited at a time and totally 47952 nodes are visited. With our approach, the 
preprocessing takes 263ms, and the search takes only 34ms. The total number of visited 
nodes is 2612. The proposed planning approach is faster than the traditional approach 
by about an order of magnitude. In addition to the reduced number of visited neighbors, 
the new approach is fast also because the potential field computed with a correct 
reachability map is more effective. The traditional approach can only use the distance to 
the goal as a heuristic since the reachable regions in the reachability map are 
discontinuous. 

6. Conclusions and Future Work 

Building autonomous humanoid robots has been the goal of many applications in 
robotics and computer animation. Spatial reasoning is a key capability to enable a 
robot to accept high-level commands and move autonomously. In this chapter, we 
have described a motion-planning system that can generate feasible humanoid 
animations with versatile locomotion in a complex virtual environment. The focus has 
been put on designing a global path planner that can generate a gross motion plan 
respecting the environmental and the humanoid constraints. The reachability can be 
defined by the properties of humanoid and can be extended through locomotion 
abilities. With the new problem definition, we modify the Best-First planning 
algorithm to account for user preference on horizontal or vertical distances that a 
humanoid travels. We have extended the planning system to consider such abilities 
and demonstrated the efficiency and effectiveness of the planner by simulation 
examples. We believe that this work will inspire further studies on the interesting 
problem of computing humanoid motion automatically for movie generation or 
autonomous humanoid robots. 

In the global planner, collision detections for transition between different types of 
locomotion are done by checking the potential collisions with both types of motions. In 
fact, this assumption has over-simplified the transition problem since the bounding 
cylinder of a transition motion could be larger than either motion at both ends. Besides, 
not every transition between different types of locomotion is possible. In (Choi et al., 
2003), a predefined transition graph is used to check whether the transition between 
different locomotion is valid or not. In the future, we would like to construct a 
transition graph; every node is a type of locomotion and the edge between two nodes 
exists only if the transition is possible. Corresponding models for collision checks 
should be attached to the edges to ensure that the global planner can properly check the 
collisions. 
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A real human usually can avoid obstacles with various kinds of locomotion and body 
motions. In the future, we would also like to enable the humanoid robot with more 
locomotion abilities when moving to the goal. For example, a human can crawl or stoop to 
avoid upper-layer obstacles. In addition, in a complex environment, there could exist 
movable objects that can be moved away by pulling or pushing by the humanoid robot to 
make ways for itself to reach the goal. We will also consider this kind of manipulation and 
reasoning capability in the future. 
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